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One of the specific recommendations in the 
Retrospective Conversion (RECON) feasibility report (ED 032 895) was 
that a pilot project be established to test various conversion 
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costs. '!'he accomplishments of the pilot project are discussed in 
detail n this document. (Author/SJ) 
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Foreword 



Since March 1969 the Li^)rary of Congress 
has been converting its bibliographic records 
for currently cataloged English-language mono- 
graphs into machine-readable form for dissemi- 
nation to the library community through the 
MARC Distribution Service. During fiscal 1972 
this program was expanded to include motion 
pictures and filmstrips. Monograph records in 
French will be added in fiscal 1973, provided 
the necessary funding is available, and plans for 
future expanvsion include adding records in Ger- 
man, Spanish, and Portugu ese. Thus, the pros- 
pects for centralized conversion of catalog rec- 
ords for current materials are encouraging. 

There has also been wide5;pread interest in 
centralized conversion of retrospective records. 
The Library iif TiniE i J Im i concerns in this 
respcctSi4£j*me both its own requirements and 
those of the library community, proposed to the 
Council on Library Resources that a study be 
conducted to determine the problems associ- 
ated with centralized conversion of retrospec- 
tive catalog records and distribution of these 
records from a central source. Funds to support 
such a study were granted to the Library, ai d 
direct responsibility was assiKn(^d to the RECON 
(Retrospective Conversion) Working Task 
Force. The task force'd major conclusions and 
recommendations were presented in a report en- 
titled Conversion of Retr^jspeciive Catalog Rec- 
ords in Machine-Readable Form ; a Study of the 
Feasibility of a National Bibliographic Service, 
One recommendation was that a pilot project 
be undertaken to test enipirically the techniques 
suggested in the feasibility study and, at the 



same time, to convert a useful body of data. 
Proposals were submitted to the Council on 
Library Resources and the U.S. Office of Educa- 
tion, and these organizations r.greed to provide 
support for both the pilot project and the con- 
tinuation of the activities of the recon Working 
Task Force. 

Most of the people who have served or. the 
advi.sory committee and task force for the RECON 
feasibility study agreed to participate in the 
RECON Pilot Project and are lo be commended 
for continuing their contributions to a project 
of national scope. 

This report describes the pilot project con- 
ducted by the Library of Congress staff. A sv/^- 
sequent publication will present the results of 
the studies conducted by the recon Working 
Task Force. In light of the problems encoun- 
tered during the pilot project, the prospects for 
a large-scale retrospective conversion activity do 
not appear encouraging at present. Neverthe- 
less, the results of the project have far-reaching 
implications for the conversion of current mate- 
rial and for future activities, in both manual and 
machine systems, of the library community. The 
profession is urged to study this report and to 
comment on the findings so that future planning 
and implementation will continue to be respon- 
sive to the most criticial requirements of librar- 
ies and their users. 



John G. Lorenz 

Deputy Librarian of Congress 

Officer-bt-Charge^ RECON Pilfd Project 
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Chapter 1 



Introduction 



Availability of machine-readable catalog rec- 
ords from a central source has long been con- 
sidered a necessary condition for effective 
application of computer technique*? m libraries. 
A significant step in this direction was taken in 
November 1966, when the Library of Congress 
began distributing MARC records for English- 
language monographs as part of the MARC Pilot 
Project. The success of the pilot project led to 
the implementation of the marc Distribution 
Service in March 1969, and since that time over 
60 subscribers have received approximately 
200,000 MARC records representing the current 
English-language monograph cataloging at the 
Library of Congress. 

When the marc Distribution Service ex- 
pands its coverage to catalog records for for- 
eign-language monographs and for other forms 
of materials, libra nes will be aoie to obtain 
machine records for a large number of their 
current titles. Obtaining machine-readable 
data for retrospective cataloging, however, re- 
mains a very serious problem. Recognizing the 
need for more research in this area, the Council 
on Library Resources provided funds to the 
Library of Congress, and in November 1968, 
a working task force of librarians and systems 
analysts representing various types of libraries 
began a study of the feasibility of converting 
retrospective catalog records, which became 
known as RECON (/Jetrospective Conversion). 
The final report of the RECON Working Task 
Force was published by the Library of Con- 
gress in June 1969.* 

The RECON feasibility report addressed itself 
to the following areas: 1) the state-of-the-art 
of hardware and software applicable to large- 
scale conversion, storage, and retrieval of re- 
trospective bibliographic information; 2) the 
organizational and administrative aspects of a 



conversion project, including identification of 
the most suitable existing file for conversion, 
determination of whi n segments of that file 
should have the highest priority for conversion, 
and development of an effective methodology 
to accomplish the tasks associated with the 
conversion process; 3) costs of hardware, soft- 
ware, and manpower as well as timing and 
funding for such a project; and 4) identifica- 
tion of areas that r3quire intensive additional 
study. The report also included analysis of 1) 
user needs for retrospective cataloging data; 2) 
means of maintaining standardization of the 
format for machine-read^ible records to allow 
libraries to exchange information in this form; 
and 3) systems design and software required 
to create, maintain, and disseminate informa- 
tion from a large data base. 

In its original feasibility study, the RECON 
Working Task Force reached the following 
conclusions : 

1) The MARC Di.stribution Service should be 
expanded to cover all languages and all forms 
of material as rapidly as resources and tech- 
nology allow. Retrospective conversion of any 
category of material should not take place until 
that category is being converted on a current 
basis. 

2) An early goal of library automation efforts 
should be the conversion of some portion of 
retrospective records to machine form. 

3) Standardization of bibliographic content 
and machine format is necessary for a national 
bibliographic data base; the standard for con- 
verting retrospective records should be the 
same as those for current records. 
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4) HighoMt priority for retrospectivo conver- 
sion should be given to records most likely to 
ue useful to the larKcst number of librurios; 
subsequent priorities should also be deter- 
mined by the same criterion. 

5) Because decentralized conversion would be 
more costly and unlikely to meet the require- 
ments for standardization, large-scale conver- 
sion should be undertaken as a centralized 
project under the direction of the Tibrary of 
Congress. 

One of the specific recommendations in the 
RECON feasibility report was that a pilot project 
be established to test various conversion tech- 
niques, ideally covering the highest priority 
material (English-language monograph records 
from 1960-68). In August 1969. a two-year 
pilot project was initiated with funds provided 
by the Council on Library Resource?. ^ U.S. 
Office of Education, and the Librar,, ui Con- 
gress. The grants from the Council and the 
Office of Education also included funds for 
the RECON Advisory Committee and support 
for several research projects to be carri^^d out 
by the RECON Working Task Force. 

The advisory committee, whose role was that 
of a t^oundinj board for the Library of Con- 
gress and the working task force, met twice 
during the pilot project. The committee mein- 
bers expressed their opinions on the work In 
progress, recommended changes in the empha- 
sis -^r direction of the project, and reported on 
activities in their sphere of interest that had 
im ilications for RECON. 

The present report is oriented toward the 
work of the project as a whole rather than to- 
ward individually funded activities. The pilot 
project conducted at the Library by LC staff 
members covered five major areas: 

1) Testing of techviques postvlatcd hi the 
RECON repor^t in art operatioval evvironmevf by 
converting English-language monographs cata- 
loged in 1968 ayid 106D hvt not inclnded in the 
MARC Distribution Service. 

This phc:se of the project partially satisfied the 
recommendation in the RECON feasibility report 
to the effect that the initial conversion ef.ort be 
limited to English-language monograph rec- 
ords issued from 1960 to 1968. The work per- 



formed during this phas^: included the training 
of RECON editors and ty[)ists, selection of rec- 
ords for conversion from Card Division card 
stock, rnodiiication of records already in 
machine-readable form (MARC I and MARC II 
practice records) for inclusion in the RECON 
data hhiAt comparison of records from card 
stock and from the mnchine-ieadable data files 
against the LC Oflicial Catalog and updating 
of the records when necessary, inputting into 
the MARC systeni reconls that had been man- 
ually edited and records that hud received no 
editing preparation but were k^yed for proc- 
essing by the format recognition programs, and 
analysis of production costs by function to 
determine cost per record. Production was 
handled by a new unit in the marc Editorial 
Office, the RECON Production Unit. 

2) Development of pvocednres and compvter 
programs 'o imploiient format recognition. 

The format recognition technique was de- 
.soribed in an appendix to the RECON feasibility 
report.- Format recognition is a machine proc- 
ess that assigns content designators and fixed 
field codes to the bibliographic record by ana- 
lyzing pun^'tuation, keywo>'ds, data content, 
etc. Content designators are the tags, indica- 
tors, and .^ubfield codes that identify data 
explicitly for machine manipulation. Fixed 
fields contain such elements as codes to indi- 
cate language, country of publication, type of 
publication, etc. The feasibility report, which 
was written before the first format recognition 
feasibility study was completed, concluded 
that "partial editing combined with format 
recognition processing is a promising alterna- 
tive to full editing."-' Shortly after publication 
of this report, emphasis was shifted to an ap- 
proach using format recognition processing 
without previous editing. The preliminary re- 
sults were promising and indicated that the 
conversion of catalog records could be expe- 
dited by reducing the amount of human inter- 
vention required. The pilot project concen- 
trated on the research to develop these 
techniques, to implement procedures and pro- 
grams for English-language records, and to 
expand format recognition to include records 
in other languages. 

3) Analysis of techniques for the coiwersion of 
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older KvolhhJ'tvffuafjr matcridlH (ind titles in 
foreign lavgintfjefi umig the romaii nlphnhct. 

The RECON feasibility report had noted the 
additional complexity of converting? forei^rn- 
hinji:uaKe materials to machine-readable form.* 
rfince the production effort was limited to the 
conversion of recent Enji:lish-lanjj;uaji:e mono- 
Kraphs, a separate phase of the project uas in- 
stituted to isolate and analyze problems asso- 
ciated with thcx^onversion of records in other 
lanjruajres and catn^o^ed accordinr to other 
conventions or catalo^in^ rules. The v^ork per- 
formed *n thf-^ pn;me included selection of a 
valid sample < title.s that would also provide 
data for other i^C projects, as v/oU as ♦^he ^dit- 
inir and typing of a sample of Frrnch and 
German monoK^'Jiph records for test piirpc-ses. 

4) Mouitoring of the ^tuie-of-ihe-ari of input 
devices that would faeilitafe converHon of a 
large data base. 

The RKCON report consiuered several types of 
input de^Mcea in an analysis of the unit cost 
per record for various techni -al alternatives."' 
The present study included a determination of 
whether significant advancers in equipment 
that would accommodate biblioj^ripbic data 
had been made since publication of the report. 
In this pha.se of the project, surveys were .con- 
ducted of keyboard devices, two of which were 
tested in a production environment, direct- 
read optica! character readers (OCR), two of 
which were tested on the v^endors' sites, and 
cathode ray tube (crt) terminals. The u.se of a 



mini-computer on-line for MARC input func- 
tions was also investi^^ited, 

5) A study of nncrofilnnng techniques and 
their associated costs. 

The RRCON report evaluated several files as 
candidates for a retrospective conversion 
elTort.'^ The Library of Congress Card Division 
record set, used in conjunction with the LC 
Odicial Cataloj^. was selected as the best file 
to meet the criteria established by the workini? 
task force. Because the record set is a "hi^h 
use" file which cannot be withdrawn in whole 
or in part for any substantial period of time, 
microfilming was sugr^e^ted as the least disrup- 
tive method of securing records for conversion. 
This phase of the project postulated four alter- 
native procedures, established microfilmin^r 
requirements to test the specifications for each 
procedure, and prepared cost estimates for 
each alternative. 

The accomplishments of the pilot project are 
discussed in detail in the sections that follow. 

Notes 

' RECON Working" Task Force Conversioft of Retroapcc- 
five Catalog Recordu to Machinv-Rcadahle Form: a 
Sfttdj; of the Ftasibilif y of a National nihliographic 
Srrvice (Washington, Library of Confjress, 1909). 
230 p. 

- Ibiti.,p. 109-179. 
' Tbid., p. 179. 
• Ibid., p. 79, 82. 
' Ibid., p. 49-55. 
Ibid., p. 20-38. 
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Chapter 2 



Siiniinary and Conclusions 



The major findings of the recon Pilot Proj- 
ect are as follows: 

1) Format recognition applied to unedited rec- 
ords has proved to be a practical computer 
technique, and the need for human editing of 
records before they are input has been elimi- 
nated. The costs of keying and proofing? for 
format recognition remain essentially the 
same as those for the processing of fully edited 
records, but it appears that formal recognition 
will permit a reduction of about 12 percent in 
the manpower cost of creating marc/recon 
re./^rds. An additional cost reduction will re- 
sult from the fact that the machine time for 
formfit recognition processing is less than that 
for Ihe format edit and content edit processing 
required for fully edited rec >rds. 

2) The preferred device for original input of 
MARC/RECON records is the IBM Magnetic Tape 
Selectric Typewriter (mtst). No other device 
met the Library's keying requirements (easy 
accomnodation of variable record lengths and 
the expanded character set) with a concurrent 
reduction in cost, either directly or througrh an 
increase in production capacity. Evaluation of 
CRT devices tor on-line correction procedures 
led to the selection of the Irascope Mo{lel lte 
as having the most desirable characteristics 
for the MARC RECON operation. It was deter- 
mined that mini-computers offer no gain 
either technically or economically for input of 
fully edited records in the LC environment. A 
reassessment of this finding may be in order, 
however, in the context of format recognition 
proce.ssing. Since the success of this technique 
depends on accurate typing, greater flexibility 
in correcting simple typing errors before proc- 



essing would promote greater accu-cicy in 
mach.ne editing. No direct-read oca device 
was found that ^^^uld perforr adequately in 
con\ ^rting LC cards to ma^^ me-readable form. 

3) The most ef^ ^ent means of producing 
source docump'^'uo from the LC Card Division 
record set ' lu film all cards in a %iven series 
against a worksheet form to produce hard 
copy via Xerox Copyflo printout. The desired 
subset of records is then selected for conver- 
sion to machine-readable form. 

4) Processing of older catalog records and 
those in foreign languages involves signifi- 
cantly more complt-^ problems, and hence 
greater conversion costs, thnn thoso encoun- 
tered in the processing of current English- 
language titles, 

5) Many practical difficulties are associated 
with the conversion of retrospective catalog 
records on a large scale. The production rates 
of the pilot project were significantly lower 
than was anticipated in the recon feasibility 
study. Although some of the problems were 
attributable to Lhe experimental character of 
the project, there is abundant evidence that 
recruiting, training, and supervision of the 
staff in such an endeavor are formidable tasks. 

b> The lowest recon unit cost that can be 
anticipated is $3.06 for an unedited record 
r>/ocessed by format recognition. Even if this 
rate were to remain constant over a long pe- 
riod, it would cost more than $900,000 to con- 
vert the estimated 300,000 English-language 
recoids issued in the 1960-67 period. 
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Chapter 3 



RECON Production 



Background 

The ;econ Working Task Force evaluated 
s^vgj^ strategies for conversion of retrospec- 
tive ratalog records. From the standpoint of 
completeness, accuracy, and quality, the Li- 
brary of Conj^resa Official Catalog was consid- 
ered the moat suitable file for use in retrospec- 
tive conversion. Varioua problems, however, 
are encount^ red in using the master records 
from the Official Catalog as input for a con- 
version project. The Official Catalog contains 
over 12 million cards, including main and 
added entries, name authority record.-i, series 
treatment cards, and other types of control 
records. Searching this file for all or part of 
the four million discrete catalog records pro- 
duced by the Librrry of Congress since 1898 
would be costly and liine consuming. In addi- 
tion, many of the master records are hand- 
written or have handwritten changes or addi- 
tions and are thus very difficult to use in a 
conversion process. 

For these reasons, it was decided that the 
actual catalog records rhould be obtainec^ from 
the Card Division recoru set, which is a master 
file of all printed cards produced by the Li- 
brary since 1898. The file is arranged by year 
(the first two digits in the LC card number) 
and then by the sequential number.^ that fol- 
low. Since the record set is used heavily, the 
RECON Working Task Force recommended 
microfilming of the cards as the best means of 
providing source documents for retrospective 
conversion with minimal disruption of Card 
Division operations. 

For the pilot project production efforts, it 
was considered more expedient to obtain the 
necessary records (in the 1968, 1969, and 7 
series of card numbers) from card stock 



rather than by microfilming the record set. 
These cards were compared with the corre- 
sponding main entries in the LC Official Cata- 
log for any changes or additions not reflected 
on the cards from stock. Although printed 
cards sold by th- Library are not always as 
up to date as the i cnrds in the Official Catalog, 
such a limitation was considered undesirable 
for machine-readable records. The Library 
itself would be unwillincr to accept machine 
records less accurate than those in the Official 
Catalog, and a national bibliographic store 
would also need records of the highest quality 
possible. 

The original estimates of recora.^ to be con- 
verted, baaed on LC catalog statistics for 1968 
and the fi/st three months of 1969, were: 

1969 and 7 series 22,000 

1968 series ^'^^^^O 
1968 and 1969 machine-readable 

records ^6,000 

TOTAL 85,000 

The machine-readable records consisted of 
those converted during the MARC Pilot Project 
(MARC I) and those converted before the 
MARC Distribution Service Was begun (MARC 
II practice records). 

As the selection of records eligible for con- 
version progressed, it became obvious that the 
number of records to be converted had been 
overestimated. Many titles reported in the 
cataloging statistics for 1968 apparently were 
not :ompletely processed until 1969 or later 
because of backlogp, and the cataloging output 
for le fi >t thve months of 1969 consisted 
prill anly of t cl < th 1968 card numbers. 
Because o*. thi ^ u g, many more of these 
r cords i ' thi^ »96b' 1969 card series were 
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received for Input aa current records for the 
MAKC Dl«trll)utlon Service. The number of njc- 
ords actually converted was 

\m\) nnd 7 seHrs 8.641 

\\m series 'A:\,\m 

UM18 nnd 1900 much inc-ronda hie 

records 15,518 

TOTAL tylMll 

Additional records to make up the deficit may 
be obtained through other sources. 

ProducHon 

Records with 10G9 and 7 series card numbers 
were manually edited at the Library and keyed 
by a service bureau. These records were dis- 
tributed to 47 MARC subscribers early in 1971. 
There was no charjre for the records or their 
duplication; instead, subscribers were re- 
quested to send tapes on whi^h the records 
could be duplicated. Approxinia. y 0,500 rec- 
ords in the 1968 card series were also maniuiUy 
edited at the Library and keyed by a service 
bureau. Another 6.000 were both edited and 
input at the Library. The remainder have been 
keyed for subsequent processinjr by the format 
recognition projrram. 

Records already in machine-readable form 
but requirinjr modifications to make the con- 
tent desijrnators identical to those in the data 
base for the MARC Distribution Service have 
been converted by special proprrams. MARC i 
records in the 1968 card series have been con- 
verted to MARC n, proofread, compared with 
the Oflicial Catalog', and updated. Two MARC 
practice tapes were processed by programs tai- 
lored to the modifications required for each 
tape. The modifications were necessary pri- 
marily because of changes made to the format 
subsequent to the time that ther.e records were 
input. These records were also compared with 
the Oflicial Catalog and updated. 

Staff 

Experience at the Library of Congress has 
demonstrated that staff members assitrned the 
task of preparing' catalog records for conver- 
sion to machine-readable form must be famf! 
lar with catalonng fundamentals. In assign- 
ing content desirnators or proofing, knowledge 
of the catalogini^ rules is necessary to riake 
the correct decisions for machine identi ica- 



tion of cataloging information. Because of the 
lar^re number of new staff members involved 
in KECON production, it was decided that for- 
mal instruction would be more efllcient than 
on-the-job tniining. 

Classes were conductec! in elements of cata- 
loging, MARC editing procedures, and correc- 
tion procedures. Ad(f^^ional .sessions were held 
on Lr subject headings and classification, LC 
filing rules, Dewey decimal 'Classification, 
workflow through the mauc Editorial Oflice. 
and the MAUC character set. Three series of 
classes were held durii.^ the period of the pilot 
project. Formal instruction lasted from 12i .j to 
19 days, depending on the size of the class and 
the aptitutie of the pupils. After the initial 
training period was completed, the editors* 
\/ork was reviewed for at least six months, and 
if their work was satisfactory, they were pro- 
moted to independent editor status. 

Instruction was provided by staff members 
from the marc Development Office and the 
MARC Editorial Office. Personnel from other 
divisions in the Processing Department were 
also invited to give briefings in their areas rf 
specialization. 

The stalf of editors varied in size during the 
coursf^ rf the pilot project, and the rate of 
turnover was high. Since all of the positions 
were temporary, it was sometimes difficult to 
find qualified individuals for the jobs. Eleven 
editor po.^itions were originally established, 
but this number was reduced to nine with the 
creation of two verifier positions early in 1970. 
That number was reduced to eight by the end 
of the project. 

Verifiers review records, with both the 
proofsheets and the input worksheets in hand, 
after the editors have completed the initial 
proofreading. Verifiers are required to have 
been independent editors for at least six 
months and to have met specific standards in 
the quality and quantity of their editing and 
proofreading. They spend a minimum of six 
months as trainees before becoming independ- 
ent verifiers. Since promotion to the position 
of verifier is based on satisfactory performance 
in MARC editorial functions, no special verifier 
training program is needed. The two verifier 
positions were filled in January and May of 
1970. 

RECON typists were assigi* d to the Key- 
boarding Unit of the MARC Editorial Office, 



and Initial training involved typing of current 
MARC recordft. At the end of a aix-month train- 
ing period, those typists who met the quality 
and quantity standardH were promoted the 
position of independent typist. The KECu typ- 
ing staff ranged in sixe from one to three per- 
sons during the pilot project. 

Supervision of recoN editors and verifiers 
was the responsibility of the head of the RECON 
Production Unit. In October 1970, an addi- 
tional supervisor was added to the s^-aff, which 
also included a clerical assistant for Xeroxing. 
To maintiun an even workload, c'^se liaison 
was established among the different units 
within the marc Editorial Office and with the 
research staff in the MmRC Development OfRce. 

As a result of two Government-wide salary 
increases during the course of the pilot proj- 
ect funds from the Council on Library Re- 
sources grant for RECON production were ex- 
pended by June 30, 1971. and the Library 
assumed cMe costs of completing the conversion 
of records in the 1968 card series. The RECON 
Production Unit of the marc Editorial Office 
was dissolved, and some of its staff members 
were absorbed into current marc operations, 
although they continued !.o work on conversion 
of the 19fi8 records. 

Card Selection 

The Card Division supplied the RECON Pro- 
duction Unit with printed cards representing 
each LC card number in the 1968. 1969, and 
7 series. Cards were drawn from stock, begin- 
ning with the cards in the 1969 and 7 series. 
Gaps in the sequence of card numbers were 
searched by the Card Division staff in the 
record set. If the gap represented the number 
of a printed card that was not in stock, the 
card from the record set was reproduced. Form 
cards were inserted to indicate cards missing 
from the record set or cards that had not yet 
been printed. 

Cards sent to the RECON Production Unit 
were then subjected to additional selection 
procedures to identify those records that were 
within the criteria established for the pilot 
project i.e.j English-language monographs. 
The determination of whether an item was in 
English was based on the text rather than the 
title page. An anthology of literature in Span- 
ish with a titk page in English, for example, 



was not included in RECON; a book with text in 
English but a title page in French was. Deter- 
mination of the language of the text depended 
on the presence of specific information on tho 
printed card. For a multilingual book (com- 
plete text in more than one language), the 
language of the first title determined its eligi- 
bility for RECON. 

Atlases, which are classified below G8000, 
were included but not single maps or sets of 
maps, which are classified as G3000 or above. 
Music and music scores wore excluded, but 
books about music were included. Other cate- 
gories excluded were records for motion pic- 
tures, filmstrips, and other kinds of materials 
that were not considered books. Records repre- 
senting serials were also excluded. Those 
labeled '*MARc'' in the lower right-hand corner 
of the printed card were excluded since they 
were already 'n tho data base of the MARC 
Distribution Service. 

The cards selected were kept in LC card 
number sequence and were then checked 
against a print index of card numbers for 
records in machine-readable form. This pro- 
cedure was necessary because catalog records 
converted into machine-readable form before 
the beginning of the MARC Distribution Service 
in March 1969 did not have the special marc 
notation on the printed cards. Since March 
1969, the word *'MARC" has been printed in 
the lower right-hand corner of the card for 
titles which are also available in machine- 
readable form. This notation ensures that revi- 
sions or changes on these cards will be for- 
warded to the MARC Editorial Office to update 
the MARC data base. 

Each nunr.ber listed in the print index was 
accompanied by a source code indicating the 
machine-readable data base in which the rec- 
ord resided. Five codes were used to designate 
the MARC I data base, first MARC II practice 
tape, second marc li practice tape, marc ll 
data base, or MARi Tl residual data base.^ 

If the RECON editor found a match on the 
print index, the appropriate source code was 
added to the printed card, and the card v/as 
placed in a separate file. The remaining cards 
eligible for recon input were reproduced on 
input worksheets. Cards not selected for RECON 
production were saved for possible future use. 

Form cards representing cards not yet avail- 
able from the Card Division were filed sepa- 
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rately. The Card Division supplied the missinj? 
cards as they became available, and the record 
selection process was then applied to these 
cards and eligible records reproduced on 
worksheets. 

Contractor Input 

RECON records from the entire 1969 and 7 
aeries and a portioi of the 1968 series were 
input by a service bureau. Because the input 
worksheets were to leave the Library, strinj^ent 
controls were necessary. The location, in and 
out of the Library, of each record had to be 
known so that worksheets could be reconsti- 
tuted ill the event of any loss. At two-week 
intervals, the contractor picked up new edited 
worksheets and corrected proofsheets and re- 
turned the worksheets and the corrected proof- 
sheets, from the previous cycle, together with 
a magnetic tape. 

The contractor used IBM Selectric typo- 
writers equipped with an optical character 
reader (OCR) typing mechanism. The hard- 
copy sheets prepared on this equipment were 
run through a Farrington Optical Scanner. 
The output from the scanner, in the form of a 
magnetic tape, was processed by the contrac- 
tor's programs to produce a tape in the MARC 
pre-edit format.- This tape was delivered to 
the Library for processing through the rest of 
the MARC system. 

In April 1970, a comparison was made of 
error rates in RECON records typed by the con- 
tractor and current records typed in the Li- 
brary. In analyzing the result^, it was found 
that the contractor's errors were generally 
more serious, e.g., omission of a field, omission 
of a record, or an incorrect tag. The general 
conclusion reached was that the overall accu- 
racy of the two groups was about the same but 
that the contractor was handicapped by not 
being able to answer typists* questions or co 
give special instructions during keying. Since 
ma ly of the contractor'? errors occurred in the 
input of diacritical marks and special charac- 
ters, the editors subsequently identified these 
characters by their hex code equivalents for 
ease of input. 

Problems of RECON vs. Current Records 

Because the RECON Pilot Project used 
printed cards as source documents, the editing 



process was subject to certain complications 
which are not associated with the processing of 
current records, for which the source docu- 
ment is a manuscript card.*' It was expected 
that editing would be easier with a worksheet 
produced from a printed card rather than a 
manuscript card because the latter includes 
hand written data, instructions to the printer, 
etc. Experience, however, showed that Xeroxed 
printed cards were often difficult to read be- 
cause of the confusion of such characters as 
e, 0, c. a. and punctuation marks. If these were 
not clarified by an editor, legibility became a 
problem for the typist. 

Inaccuracies on printed cards may be due to 
errors in either cataloging or printing. Since 
assignment of content designators can be made 
without ascertaining the correctness of the 
data in the field, errors may be overlooked 
during the editing process. Problems that arose 
in this connection were resolved by referral to 
the principal subject or descriptive cataloger. 
An analysis of records with cataloging /print- 
ing errors showed that 144 of approximately 
20,000 recordi, (0.72 percent) contained 151 
errors that required cataloging decisions. It is 
likely that the actual occurrence of such errors 
is somewhat higher, since some errors remain 
unidentified. 

Differences in cataloging rules and proce- 
dures are critical problems in the conversion of 
older records and foreign-language records 
originating from shared cataloging copy. An 
analysis of these problems is presented in 
Chapter 6. 

Since the book is not examined in the retro- 
spective conversion process, difficulties arise in 
assigning certain fixed field codes from infor- 
mation on the printed card alone. In convert- 
ing current catalog records to machine- 
readable form, many of these cedes are 
assigned by descriptive or subject catalogers 
who have the book in hand. A RECON editor 
may encounter problems, for example, in ascer- 
taining the proper language codes for a multi- 
lingual publication because of ambiguities in 
the title paragraph or in the notes. He may 
also have difficulty determining whether a par- 
ticular title is a conference publication cr a 
biography. It was concluded that editors and 
verifiers must devote greater attention to these 
problems than is required in the editing of cur- 
rent MARC records. 



8 



Catalog Comparison 

During the recON Pilot Project, all records 
were compared against the Official Catalog. It 
had originally been thought that additional 
staff members would be hired for this task, 
but it became apparent that a shortage of qual- 
ified staff and the relatively short timespan of 
the project made such hiring impractical. 
Catalog conjparis^on was instead assigned to 
the RECON editors, who already knew how to 
write in corrections and required orly minimal 
additional training for the work. 

Two RECON editors participated \r ah expe- 
riment to test eight possible methods of cata- 
log comparison. The alternatives considered 
involved the following activities; 1) printouts 
of verified records arranged v^d checked in 
alphabetical order; Li proofsheets (already 
proofed) arranged and checked in card num- 
ber order; 3) proofsheets (not proofed) ar- 
ranged and checked in card number order; 4) 
proofsheets (already proofed) arranged by 
card number but checked by mental alphabet- 
ization; 5) proofsheets (not proofed) arranged 
by card number but checked by mental alpha- 
betization; 6) worksheets (before editing) 



arranged by card number but checked by men- 
tal alphabetization ; 7) worksheets (before 
editing) arranged and checked in alphabetical 
order; or 8) worksheets (before editing) ar- 
ranged and checked in card number order. 

A group of 200 records was used for each of 
the proposed methods. For alternatives 2-8. 
the records were separated into batches of 20. 
The editors sf^arched the Official Catalog, 
made the necessary corrections, and recorded 
the time spent as well as the number of 
changes made. Figure 3-1 shows the average 
number of records checked per hour using aach 
of the proposed methods. Table 3-1 gives the 
estimated cost per record for five of the 
methods, based on the prevailing salary rates 
and other costs at the time the test w?s made. 

The editors participating in the experiment 
found that the task of arranging worksheets 
in alphabetical order by main entry was time 
consuming and tedious. They also dia" vered 
that checking the Official Catalog wiii the 
records arranged in order by card number 
was not as difficult as anticipated because the 
entries tended to fall into a rough alphabetical 
order. Even mental alphabetization (in this 
context, searching the catalog alphabetically 



Figure S-L Hourly rates for eight methods of catalogiyig comparison^ 

20 ^0 



1 iillltl_liliiiili_ii il liiil I itil 



Method 1: PRINTOUT checked in ALPHABFTICA L order 



Method 2: PROOFSH BETS (already proofed) checked in WORKSHEET order 






Method a: PROOFSH EETS (not proofed) checked *n WORKSH EET jrder 





Method 4: PROOFSHEETS (already proofed) checked by MENTAL ALPHABETIZATION 



Method 5: PROOFSHEETS (not proofed) checked by MENTAL ALPHABETIZATION 



Method 6: WORKSHEETS before editinK (not input) checked by MENTAL ALPHABETIZATION 



Method 7: WORKSHEETS before editing (not input) checked in ALPHABETICAL order 



Method 8: WORKSH EETS befort editing (not input) checked in WORKSH EET order 



< Takev from CataloK Comparison: An Evaluation {an internal document prepared for thv MARC Development Office). 
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Tahlv .y-/. AdjKHtvd cost fignrcH for cat a fog compariHoti, <;// method 



Method 


Average 
number of 

records 
per hour 


Unadjusted 
cost ' 


Annual 
and siek 
leave 


Additional costs - 
Supervision 


Fringe 
benefits 


Total 
adjusted 
coat 


4 


20 


$.220 


$.037 


$.104 


$.027 


$.388 


3 and 7 


33 


.132 


.024 


.003 


.Olfi 


.235 


8 


44 


.100 


.017 


.047 


.012 


.170 


1 


50 


.087 


.010 


.041 


.011 


.155 



^ Takt'n from CiUnlntf CcMiipnriHon Kvniiintinn (nn intvntnl tl»nnnrttt\. 
Batted on a8nunti>tioi\» uitcd in the original KEcnN report. 

by main entry althoujth the records in the 
baLh were in order by card number) did not 
substantially increase searching time. They did 
find catalog comparison easier when using a 
worksheet, which consisted of a copy of the 
printed card, because it was easier to spot 
revisions. The printing format of the proof- 
sheet, the possibility of the typists' omitting 
fields, and the fact that mnny of the diacritical 
marks and special characters are represented 
by different characters on the print train used 
to produce the proofsheets made the proof- 
sheets unlike the printed card in appearance. 

Although the results of the test indicated 
that method no. 1 was slightly faster than the 
other methods, it would require substantial 
modifications to the present marc system if 
actually implemented. Additional sorting and 
printing would be necessary to produce hard 
copy if catalog comparison were performed 
after all records eligible for recon had been 
processed and verified. Since many records 
would have to le corrected after comparis(»r to 
reflect the changes found in the Official Cati - 
log main entry, additional updating eyelet 
would also be required. If catalog comparison 
were performed whenever a batch (approx- 
imately 4,000) of verified records were avail- 
able, extensive system changes or the creation 
of multiple data bases would be necessary in 
addition to the sorting, printing, and updating 
cycles. Since these additional maintenance and 
processing routines would require more com- 
puter time, the total cost of method no. 1 would 
be higher than that depicted in Table 3-1. In 
addition, the editors found the printouts more 
difficult to use than the worksheets when doing 
catalog comparison. The decision was there- 
fore made to implement method no. 8, under 
which unedited worksheets in order by card 
number were also checked in the same order. 



Original plans for catalog comparison also 
included the writing of a new print program 
to produce a printout with records in two col- 
umns. One column would contain records in 
numeric order by LC card number, and the 
other column would consist of records by main 
entry. By cutting the printout in half verti- 
cally, the alphabetical sort could be used for 
catalog comparison, and the numerical sort 
could be used for proofing. Before coding for 
the two-up print program was begun, however, 
the catalog comparison experiment showed 
that searching the Official Catalog with a 
printout in alphabetical order did rot substan- 
tially increase production. Since the editors 
preferred using the worksheets rather than 
the printouts, i^ was decided that the new 
print program would not be necessary. 

Procedures for catalog comparison were 
worked out for the recon Production Unit. 
During the comparison process, **marc'' was 
written on the main entry cards in the Official 
Catalog to ensure that corrections or revisions 
to the card are forwarded to the marc Edito- 
rial Office. If the corresponding record was 
not found in the Official Catalog, a special 
cataloging certification code was added to the 
worksheet by the editor. An additional fixed 
field was included in the LC internal process- 
ing format to carry the catalog certification 
information. This field is not part of the MARC 
communications format for books. All work- 
sheets were input regardless of whether the 
records had been cervified in the Official Cata- 
log; in the future, records that have not been 
certified in the Official Catalog can be obtained 
from the RECON master data base and checked 
again. 

During the recon Pilot Project, 7,528 re> 
ords in the 1969 and 7 series and 34,628 records 
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Yahic Data clc^nnita aff(cted b]i chnngcs in KKCON records 



Dntn oM mont 


VMM) (403 records) 
Number Percent 


19G8 (1D89 records) 
Number Percent 


Total 


409 


1 an n 


9 Q 1 9 


1 UU.U 


Main entry 


O 1 


1 0 R 


uOll 


1 O.D 


Boriy of eitrv 


1 D 


Q T 

0.7 


108 


4.7 


Ci nation 


8 


2.0 


89 


3.9 


Sorios sta.onuMit 


5 


1.2 


07 


2.9 


Notos 


31 


7.G 


281 


12.2 


Subject ' jadinp 


17 


4.2 


180 


8.0 


Added entry 


54 


13.2 


423 


18.3 


ClassificRtion number ■ 


105 


40.3 


050 


28.4 


Dewey r umber 


15 


3.7 


73 


3.2 


Added opy 


9 


2.2 


3] 


1.3 


Dash OP try 


2 


.4 


3 


.1 


NKN or SBN number 


7 


1.7 


19 


.8 


Price 






9 


.4 


CataloK card number 






8 


.3 


' Tkr hiijh inciiienrr of chtiTifjee 
Wiut *■/./! M"* to many rardit. 


in claamfientiiiii 


number pritnarily 


rvflrctn Ihr 


tidiiition of thr 



in the 1968 series were compared against the 
Official Catalog, with the following results: 

1) Three hundre 1 and thirty-five 1969 main 
entry j ecords (4.o percent) and 1,671 1968 
main entry records (4.8 percent) were not 
found in the catalog. 

2) As a result of catalog comparison, changes 
were made in 339 1969 records (4.7 percent) 
and 2,149 1968 records (6.5 percent). An aver- 
age of 1.1 changes per record were required. 

3) Because almost as many records were not 
found (2,006) as were changed (2,488), a 
105-card sample of missing records was 
studied. Analysis of this sample indicated that 
45.7 percent of the records originally not found 
would have changes on them. Cardd that were 
initially missing were located in the recheck 
for a variety of reasons. A card out notice had 
been replaced by the card itself in some cases. 
In other cases, an added entry pointing to the 
new main entry had been filed after the first 
catalog comparison. 

4) If figures on the number of changes re- 
quired are adjusted to take into account rec- 
ords originally not found, the percentages 
noted in paragraph 2 increase to 6.5 percent 
for 1969 and 8.4 percent for 1968. 

5) Tie uniform filing title is usually not 
printed on LC cards; 6.7 percent of the 1969 



records i^nd 5.6 percent of the 1968 records 
required the addition of a uniform filing title. 
The degree of overlap between records with 
Official Catalog changes and records with 
added uniform filing titles was not determined. 
Table 3-2 shows the data elements in the cata- 
log records that were affected by catalog 
comparison. 

It should be noted that catalog comparison 
was performed on relatively current records 
during the pilot project. Additional difficulties 
would occur in comparisons involving records 
in foreign languages, handwritten records 
such as the master record in the Official Cata- 
log, or older catalog records for which the 
printed card format and cataloging rules dif- 
fered from present practices. 

Notes 

' In the MARC system^ the residual data base contains 
records in the process of correction and verification. 
Once the records are declared free of errors, they are 
transferred to the master data base, 
" The MARC input sypcem consists of four major pro- 
prams: pre-edit, format edit, content edit, and update. 
Tapes received from the contractor in a pre-edit output 
format could be input directly into the format edit pro- 
Rram. 

The manuscript card is used at the Library of Con- 
gress to record cataloging Information and as copy for 
the printing of catalog cards by the Government Print- 
ing Office, 
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CHAPTER 4 



Formal Recognition 



Background 

The preparation of bibliographic data in 
machine-readable form involves labeling each 
data element so that it can be identified by the 
computer. For this purpose, the marc format 
employs tags, indicators, and subfield codes (or 
content designators) . In the current marc sys- 
tem, these content designators are supplied by 
the MARC editors before the data are typed 
on the MTST. The MTST tape cassette is con- 
verted to computer-compatible tape, which is 
then run through a series of computer pro- 
grams to produce a proofsheet. In the proofing 
process, the editor compares the proofsheet 
against the original worksheet, checking for 
errors in editing or keying. Corrections are 
retyped and processed by +ne MARC system 
programs, A new proof shei* is produced by 
the computer and checked fo errors. Records 
that are error free or "verif'ed" are then re- 
moved from the work file anr stored in a mas- 
ter file. 

Since manual assignment of content desig- 
nators and fixed field information by the 
editor is a detailed and somewhat tedious proc- 
ess, it seemed advantageous to develop a 
method whereby the computer would assign 
the content designators for bibliographic data 
by examining data strings for cevtain key- 
words, significant punctuation, and other 
clues. This technique, referred to as format 
recognition, was not entirely new at the Li- 
brary of Congress» The need for such a com- 
puter program had been recognized during the 
planning stage of the marc Distribution Serv- 
ice, but the pressure to implement the distribu- 
tion service prevented more than minimal 
development of format recognition processing. 
The viability of such a technique has since 



been proved at other institutions, principally 
the Institute of Library Research at the Uni- 
versity of California. Berkeley, and the 
Bodleian Library, Oxford. 

The Library began its work on format rec- 
ognition with a feasibility study conducted 
during the winter of 1968 69. At that time a 
certain amount of editing for marc records 
was being performed by catalogers, and the 
study tested the possibility of using format 
recognition to assign content designators not 
already supplied by the catalogers. The study 
was divided into two parts, the first part ana- 
lyzing those fields for which the cataloger sup- 
plitd the tags and indicators but not the sub- 
field c?des and the second part analyzing those 
fields foi which no tagging information was 
supplied by the catalogera. Detailed flowcharts 
of the algorithms were prepared, and a statis- 
tical analysis of the recurrence of types of 
fields was performed to provide the basis for 
an estimate of the effectiveness of such a tech- 
nique. The results indicated that if a format 
recognition technique were used for partially 
pretagged data, roughly 85 out of 100 records 
would be processed without error. The results 
of this initial study were encouraging enough 
for the Library to proceed with the develop- 
ment of a format recognition project. 

Logical Analysis 

Implementation of the project was divided 
into several tasks. In the first task, the 
algorithms from the initial feasibility study 
were examined again to determine how suc- 
cessful they would be if there were no human 
editing. It was assumed that the typists would 
-"ype directly from a printed catalog card or a 
nanuscript card. The computer program 



ERIC 



would take the raw data and supply the necea- 
aary content designators. The accuracy of set- 
ting fixed fields completely by computer was 
also studied. The results showed that, with 
accurate typing, records coukl be processed 
correctly approximately 70 percent of the time: 
that is, 70 out of 100 records would be correct, 
and the other 30 records would have errors in 
one or more fields. On the basis of these results, 
the decision was made to implement format 
recognition using unedited catalog records. 

The second task covered several areas, in- 
cluding the development of input specifications 
for the typist. In general, these specifications 
provide for typing of the record from an 
untagged card on an input device with a type- 
writer keyboard. The information on the card 
is transcribed from left to right and from top 
to bottom. The data are input as fields, which 
can be detected by the program because each 
field ends with a carriage return and each field 
continuation is indicated by a carriage return, 
tab. Each field corresponds to a logical portion 
of the card; thus, the call number is input as a 
separate field, as are the main entry, collation, 
each note, each added entry, etc. The title par- 
agraph is input as a single field with the title, 
edition, and imprint separated by delimiters. 

Sixty keyword lists for English-language 
materials were compiled. The lists contain over 
2,500 keywords covering such items as U.S. 
cities, foreign cities, geographical areas, words 
frequently appearing in corporate names or 
meetings, and honorary titles used with per- 
sonal names. 

Three possible processing sequences were 
considered : 



1) Processing of independent fields before de- 
pendent fields, e.g., title field before title added 
entry field. 

2) Processing of fields in order of their occur- 
rence ; use of a "look-ahead" technique to ana- 
lyze as yet unencountered independent fields 
when necessary. 

3) Processing of fields in order of their occur- 
rence; use of a '1ook-back-and-fix-up" tech- 
nique Ic modify previously encountered de- 
pendent fields when necessary. 
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The third approach was selected for the logical 
structure oi the program. 

The final i^roduct of taak 2 ^ 'as the documen- 
tation of logical specifications for the entire 
program,' a simplified description of which 
follows. 

Gross identification of fields depends on thp 
location of the collation, which is Jhe only field 
that is easily identified ana ahv.iys present. 
Each of the first five fields is searched for the 
presence of '*p.*' or **v." preceded by an arabic 
or roman numfjral. If there is no hit, the field 
is searched for the presence of *'cm." or '*mm." 
Once the collation is found and identified, the 
fields preceding it can be identified. The call 
number is recognized as a character string 
beginning with one to three uppercase letters 
followed by one to four numbers. The remain- 
ing unidentified fields preceding the collation 
are identified as the main entry, unifor^n title, 
and title paragraph, depending on the number 
of such fields present. For example, if there 
are two unidentified fields preceding the col- 
lation, the first is tagged as the main entry and 
the second as the title paragraph. If there is 
only one unidentified field, it is tagged as the 
title paragraph. 

The fields following the collation are exam- 
ined separately. The characters at the begin- 
ning of each reld are analyzed, and gross tag 
identifiers are assigned. For example, fields 
beginning with an arabic numeral (number- 
period-space) are identified as subject entries. 
Those beginning with roman numerals are 
tagged as added entries. If the first character 
is a quotation mark, the field is identified as 
a note. When the field begins with an open 
bracket, additional analysis is performed on 
t>ie characters following the bracket. If the 
bracket is followed by an arabic numeral, the 
neld is tagged as a subject entry. If the bracket 
is followed by a alphanumeric combination of 
characters, it may be an additional LC classifi- 
cation number. Usiii , clues such as these, each 
field is assigned a tag or a partial tag, and a 
preliminary record directory is built. 

The bulk of the program is devoted to the 
reexamination of each field to provide final and 
complete tags, indicators, and subfield codes. 
Each field is divided into groups and sub- 
groups. A group is roughly defined as a data 
string ending with a significant period. A sub- 
group is a data string within the first grotip. 
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K.ramp/'«; (Gr. =r Group; S^r. = Swbgioup) 

1) Chicago. Art Inst i tutu. 

Gr. 1 ' Gr. 2 

2) Smith, John William, 1898-ll»fi5, comp. 
Sjr. 1 ' S^r. 2 Sff. 3 Sff. 4 

Gr. 1 

The first group and its aubgroupa are analyzed 
to provide the full tag and the first indicator. 
In the first example, '*Chicago" is matched 
against the keyword lists, and when a match 
i,s found identifying it as a city, the field i? 
identified as a corporate name entered under 
place. The second group i,s checked against the 
keyword list to see if it is a form subheading. 
Since there is no match, it is identified as a 
"b" subfield (administrative subunit) by 
default. 

The second example consists of one group 
made up of multiple subgroups. Tlie subgroups 
are scanned for a date range, whose presence 
and location are strong indications that the 
field ia a personal name. The subgr* up follow- 
ing the date subgroup is checked against the 
keyword list and is identified as a personal 
name relator. The second subgroup is checked 
against the honorary titles keyword list. Since 
there is no hit. the sec(»nd subgroup is identi- 
fied by default as forenaim^ Since the first 
subgroup contains only one word, it is consid- 
ered u single surname. Thus, the field is finally 
identified as a personal name 'single surname 
with date and relator subfields. 

After each field is fuljy identified, the data 
are scanned for information needed for the 
fixed fields. For example, if the Dewey number 
field contains the notation **[Fic]" the intellec- 
tual level indicator is set to (juvenile) and 
the fiction indicator to **x.'* If the words 
"autobiographical/* *'diary/' ''diaries,*' or 
'*memoirs" appear in the title field, the biog- 
raphy indicator is set to *'a." The date of pub- 
lication field is derived from the date subfield 
in the imprint. The city transcribed in the 
place of publication subfield in the imprint is 
matched against the keyword lists to derive 
the country of publication code. 

Simulation 

Before the format recognition programs 
were coded, a manual simulation was con- 



ducted to test and, if necessary, improve the 
algorithms, keyword liat^', and input specifica- 
tions for the typists. Records for 150 English 
hinguag-^ monographs, generally consisting of 
'*diflicult" records, were selected for the simu- 
lation. The source data, in this case the manu- 
script cards, were input by the typists on the 
MfST according to the specificatiors designed 
for format recognition. The simnators added 
content designators to the MTST hard copy, 
following the format recognition logical de- 
sign. The simulation records v. ere processed by 
the current marc system to produce ijrinted 
proofaheets. 

The simulation records were then checked 
by RECON editors, who were requested to keep 
track of the time required for proofreading 
format recognition records and to submit an 
informal report on their reactions to this work. 
Format recognition did appear to decrease the 
amount of time spent in the combined editing 
and proofing process, but it was demonstrated 
that .success of the program would require ex- 
tensive training of the input typists since accu- 
rate typing of records would be essential for 
the algorithms to work, as well as training of 
the editors to alert them to the kinds of errors 
the format recognition program might make. 

Program Structure 

The final , tasks included the production of 
detailed flowcharts at the coding level and the 
actual coding and program testing. Coding be- 
gan in July 1970, and final testing was com- 
pleted in May 1971. 

The format recognition program was writ- 
ten in Basic Assembler Language fr>r the IBM 
360 40 operating under DOS.- Input to format 
recognition consists of a tape created hy a pre- 
edit program. On the input tape, e:^.ch data 
field (call number, main entry, etc.) exists as a 
separate logical record followed by a field ter- 
minator code. The last data field in the catalog 
record is followed by a record terminator code. 
Logical records on the input tape are blocked 
into a physical record, which is about 2,500 
bytes in length. Output from format recogni- 
tion is a machine record in the LC MARC proc- 
essing format.* 

The format recognition program is modular 
in design to facilitate addition of new al- 
goriths or changes in existing algorithms. The 
program consists of a mainline routine and five 
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principal routines. The nnainMne routine, 
which controls the overall processing, consists 
of opening of input, output, and keyword list 
files» callini? for the five principal routines as 
required, and closing of the output file when 
an end-of-record condition is sensed in the in- 
put file. The principal routines are as follows: 

1) Step 1 (set 'ip record routine) build.? the 
framework of the marc processing format, ini- 
tializes fixed fields, flags, and indicators, and 
builds those fields of the record which are not 
dependent on the content of the input data. 

2) Step 2 (input and identify record routine) 
reads the logical records on the input tape 
created by the pre-edit program until it 
reaches the record termina^or code signifying 
the end of the catalog record. Data that have 
been input as a duster of mJapc var'able fields 
(e.g., the title paragraph, containing the title 
statement, edition, or imprint statement, and 
the collation paragraph, consisting of the col- 
lation, series, or price) are separated into indi- 
vidual fields. Preliminary identification of each 
data field is made, and a preliminary record 
directory entry consisting of the first two 
digits of the MARC tag, a sequence number for 
the tag, the starting character position of the 
field in the input buffer area, and the field 
length is built for each field identified, 

3) Step 3 (pj'ocess input field routine) sorts 
the record airectory -by tag and sequence 
numb'er.^ Each variable field is processed to: 
a) build the remainder of the field's entry in 
the record directory; b) derive the third digit 
of the tag; c) set variable field indicators 
which can be generated from analysis of the 
field being processed; d) delimit and assign 
subfield codes; and e) set any fixed field infor- 
mation which can be derived from the content 
of the variable field being processed. 

4) Step 4 (complete variable field proce.ssing 
routine) completes the processing of the v.ari- 
able fields. The **look-back-and-fix-up*' tech- 
nique takes place in this step. For example, 
the geographic area code and the language 
codes are assigned on the basis of analysis 
performed in the preceding steps. In addition, 
all final text cleanup is performed, and the 
record built by the processing routines is 
moved and assembled in the output area. 



5) Step 5 (output record routine) sorts the 
completed record directory again by tag and 
sequence number. Final adjustments are made 
in assembling the parts of the record, includ- 
ing final entries in the record's communica- 
tions area. The record is then written on the 
output tape. 

The mainline routine calls the principal rou- 
tines in order (1-5 above) and repeats the 
process for as many catalog records as are on 
the input tape. Each routine exits to the 
mainline, which calls for the next routine until 
an end-of-fi)e condition is found and the proc- 
essing completed. 

The program consists of six levels of rou- 
tines, and the routines at each level can be 
called for execution by the routines at the next 
highest level in the hierarchy. The level con- 
cept is shown as follows : 



Level 

1. Mainline routine 

2. Steps 1-5 

3. Substeps 

4. Level 0 subroutines 
6, Level 1 subroutmes 
6. Level 2 subroutines 



Called by 

Mainline routine 
Steps 1-5 
Substeps 

Level 0 subroutines 
Level 1 subroutines 



The communications buffer containing perti- 
nent data that may be required by any of the 
routines (mainline, steps, substeps, and levels 
0-2 subroutines) is the vehicle of communica- 
tions across al! routines. This buflFer is assem- 
bled as an entity, and all routines in the format 
recognition program are linked with it as each 
routine is assembled. This allows all routines 
to be written as reentrant routines.'' The cur- 
rent format recognition program is an oif-line 
process; however, the Multiple Use marc Sys- 
tem (mums) may provide on-line processing 
and multiple terminal access to the format rec- 
ogni -ion program.^ Existence of the communi- 
cati( ns buffer and the reentrant routines will 
facintate the modifications required to incor- 
pori'te format recognition in an on-line process. 

The keyword lists used by the format 
recognition program are maintained as a sepa- 
rate data set on a 2314 disk pack but are 
stored in memory when the format recognition 
program is running. The total amount of core 
storage required for the format recognition 
program under DOS is approximately 120K, 
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80K for the program anr? ^^K for the keyword 
lists. 

Peripheral Proj^rams 

Two peripheral programs were written to 
support the format recognition project. For- 
mat recognition test data generation 
(FORTGEN) is an assembly language program 
which provides teat data for format recogni- 
tion by stripping MARC records of tags, delim- 
iters, indicators, and subfield codes, and refor- 
mats the data to be identical with output from 
the pre-edit program, fortgen can process any 
given number of MAKC records at any desired 
point in the MARC data base. Thus, a large 
quantity of high quality test data was provided 
without additional keying. 

The keyword li.st maintenance program 
(KLMP) is an assembly language program, 
which creates and maintains the 60 keyword 
lists used by the format recognition program 
in processing bibliographic records. The lists, 
associated tables, and control data are referred 
to collectively as 'keyword list structures." 
The principal fundi ons of KLMP are tr read 
the entire set of keyword list structure from 
the file on disk, modify them as specified by 
parameter cards, and wriie a new file on disk. 
The individual functions performed by KLMP 
include the following: 1) create a list; 2) 
remove a list; 3) add a keywo d; 4) delete a 
keyword; 5, augment a table b ' adding new 
codes to '*ie translation table to generate 
codes su as the geographic area code, lan- 
guage code, country of publication code; and 
6) print an entire list or selected portions. 

This program provides the flexibility re- 
quired to change or update the keyword lists 
which are expected to be dynamic in nature. 
New lists will be added as format recognition 
is extended to include other languages, and 
keywords will be added to or deleted from 
existing lists as experience is gained in the 
use of format recognition. If the keyword lists 
were buiU into the format recognition pro- 
gram itself, it would be necessary to recatalog 
the program each time a keyword was changed. 

Format Recognition Production 

Approximately 17,000 recon records in the 
196f card series have been processed by the 



format recognition program since actual pro- 
duction began in May 1971. recon records 
rather than current marc records were used 
to test format recognition because recon rec- 
ords were not sent out at regular intervals by 
the MARC Distribution Service. Current MARC 
records h;ive been processed by format recogni- 
tion since January 1972. 

The workflow for the manua' editing process 
involves editing the records, keying them on 
the MTST, processing these records on the com- 
puter (including converting the MTST tape cas- 
sette to computer-compatible tape), proofing, 
and verifying. Format recognition eliminates 
the editing proce.ss, a? shown in Figure 4-1. 

The time estimated in the RECON report for 
format rf^cognition processing using unedited 
records as input was four seconds per record.^ 
The actual machine time for production runs 
is 1 'J second per record, plus approximately 
second per record for the pre-edit program, or 
a total of Yi second per record. This processing 
time compares favorably with that for the cur- 
rent MARC system (pre-edit, format edit, and 
content edit programs) which is approxi- 
iYiately three seconds per record." 

Although a decrease in machine processing 
time of approximately 2^4 seconds per record, 
when projected over thousands of records, rep- 
resents a significant gain, the principal hope 
for format recognition lies in relievincf the 
human burden of editing and increasing over 
all production rates. Production and error r.ites 
for RECON editors were tallied for both conver- 
sion procedures: 1) editing records before in- 
put and proofing and 2) proofing format 
recognition records. Table 4-1 shows the re- 
.«ults of this analysis. 

The format recognition production rate of 
8.4 records per hour (proofing only) repre- 
sents a significant increase over the 4.6 rec- 
ords per hour for the combined editing and 
p»*oofing process. Because proofing format rec- 
ognition records is more diflRcult, the rate is 
le.ss than that for proofing edited records 
{about 10 per hour). With format recognition 
records, the editors must be aware of the er- 
rors made by the program, which can be quite 
different from errors made by human editors, 
as well as keying mistakes. These rates were 
calculated over a relatively short period, and it 
13 possible that the editors' production would 
rise as they gained more experience; however, 
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FORMAT RECOGNITION WORKFLOW 

Figure 4-1. Workflows for two methods of ivputting MARC records 
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Table It-i. fCdlfiKO ^proofhtff prodnciian ami vrror rates for vditt il <ivd mtcditrd Ttcordu 



record 



Edited reco h 
Kditinj,^ only 
Proofin^,^ only 



*ro(iuction rnto 
(per hou» ) 



4.ti 
9.2 
9.2 



Ki ror rate ' 
(per batch ()f 
20 records) 



Comments 



3.0 Production rate ri^rure is liECON Unit averjijjc 

3.0 from February 1970-April 1071, Kditin^r error 

3.0 rate fipfuro is keCon Unit average from a 10 

percent sampling: of (iOOO records edited from 
Juh'j-Aupust 1970, Prooflnfr error rate fijjure 
is RECoN Unit averajre from June 1970~ApriI 
1971. 



Format rvcognition 8.4 2,5 Pud'^tion rate and error rate are hecon Unit 

(Combines editinu averages from May 10-May 30, 1971. 

and proo'^^ir) 



^ T \f values in this rolnmix rvjirvtunt ihr itrrrfif/f numhrr nf itrtuni Hiihtttantivr * rrorfl per batch tundr bu thr t^ditnr^ attfl/or itrotifrrn. Th*fic 
i'rrOrn inciudr mi^ntittiftn nirnt nf tuiJH, in dim torn, nrtri auUfn.id I'odt'n nr fai.urr to rnrrrH irror^ in thr data rnnd'ttt itntlf, f\f), nM«H/?t«//i>i// of 
author nanu'H. in the caer of fort,ut* rir>?{/t\U\ov, thr i*rri»r rate iuidadva errors made Uy thf formnt rfrnuf\ition t>rouram and not rorrrrtrd 
iiU tkv proo/cr*. 



it is also recog:nized thnt with a repetitive task 
like proofing, it is unlikely that production 
rates would continue to rise once a certain 
plstef^Li has been reached. 

Format Recognition Typing 

Tests were also conducted to compare pro- 
duction rates and keying errors of input typ- 
ists at the Library for edited records and for 
unedited format recognition worksheets. Table 
4-2 shows the results of these tests. Of the 
1,39>^ errors, 394 (28.3 percent) caused the 
forpiat recognition program to misidentify 
data, i.e., to assign incorrect content designa- 
tors. Typing speed, however, was slightly 
higher in keying format recognition records 
since there were no content designators to be 
typed. 

Since there is a need for typing accuracy in 
fo'^mat recognition and since it is possible that 
future requirements would necessitate the use 
of a contractor to support the LC staff, a typ- 
ing test was also conducted by the contractor 



that had input RECON records. The contractor 
was asked to type 1»000 records for input to 
format recognition, with special emphasis on 
accuracy in typing. A goal of one error in 20 
records was established. The first 900 records 
were considered practice records, and the final 
evaluation was b^sed on the last 100 records. 
Since the contractor's proofsheets did not dis- 
tinguish between uppercase and lowercase 
characters (i.e., a shift character was used to 
indicate an uppercase condition), the LC 
proofsheets produced after the records had 
been processed by format recognition were 
used to tally errors. Only errors made in typing 
were tallied; errors made by the format recog- 
nition program were ignored. 

To attain an accuracy level of one error in 
20 records, two proofreading cycles were re- 
quired by the contractor before the records 
were submitted for input to format recogni- 
tion. An accuracy level of one error p.r 7.7 
records was attained w^ith one proofreading 
cycle. Since experience has shown that errors 
are also made when correction records are 



Table ^-2. Keying production and error rates for edited and unedited records 









Error rate 






Production rate 


Number of 


Number rec» »*ds 


Percent records 


Type of Record 


(per hour) 


records checked 


with errors 


with errors 


Edited records 


12.9 


300 


178 


59.3 


Format recognition records 




2,879 


1,029 


35.8 



18 

O 

ERIC 



typed, the actual error rate in an operational 
situation would probably bo slightly higher. 

The typing: was done by an expert typiat who 
had a high school education and 18 mo^^ths ex- 
perience typing bibliographic records. Train- 
ing time was minimal (one day). Th^ typist 
performed the first proofreading. The second 
proofreading was done by an empl(«yee who 
had only three months' experience with biblio- 
graphic records but who had completed two 
years of college and one year with a profes- 
sional secretarial concern. Although a longer 
training period was required (one week), she 
showed considerably greater ability in detect- 
ing errors. This limited experience does not 
offer conclusive evidence but does seem to indi- 
cate that ability beyond the range of an aver- 
age typist is needed to detect the kinds of 
errors that occur in the typing of catalog rec- 
ords. It should be noted that the LC input 
typists do not proofread their records before 
computer processing because typing errors 
cannot be corrected very efficiently on the 
MTST after the record has been completely 
typed. 

Assuming a production figure of 100,000 
records, the contractor estimated that records 
could be delivered with an accuracy level of 
one error per 20 records at a cost of $0.85 per 
record. With an accuracy level of one error in 
10 records, the cost would be $0.75 per record. 

The format recognition typing test was also 
used to determine whether printed cards could 
be produced on a photocomposition device from 
unproofed format recognition records (assum- 
ing one typing error per 20 records). Since 
the kinds of errors created by the format rec- 
ognition program would not generally affect a 
print program, it was thought that a large 
number of records could be converted to 
machine-readable form in this manner. Al- 
though the format recognition records would 
not be proofed, it was expected that the 
printed cards produced as a byproduct of this 
process would be checked against the original 
copy for omission of fields, data elements, 
etc. Although the results were satisfactory, 
this project was not implemented because of 
the many problems involved in maintaining 
these records in a separate data base and up- 
dating them for bibliographic content or marc 
content designators. It did not appear advisable 
to convert a large number of catalog records 



to machine-readable form wilnout providing 
full retrieval capability for these records at 
thr Library of Congress as well as the poten- 
lal for printing. 

More experience required to staoilize for- 
mat recognition production rates for both 
proofing and typing. Since production rates 
have increased with no change in the number 
of personnel and machine processing time has 
actually decreased, it can be stated that con- 
version via format recognition is more econom- 
ical than conversion using records completely 
edited prior to input. Monitoring of production 
rates and costs has continued beyond the ter- 
mination date of the recon Pilot Project. 

Expansion of Format Recognition to 
Foreign Languages 

The format recognition algorithms were 
formulated to process records for English- 
language monographs. Because of the commit- 
ment to investigate foreign-language material 
under the recon Pilot Project and the planned 
expansion of the marc Distribution Service to 
include records in other roman-alphabet lan- 
guages, the Library was also interested in ana- 
lyzing the requirements to expand format rec- 
ognition to include the processing of records in 
foreign languages. 

Although the computer programs have not 
yet been modified to handle foreign languages, 
analysis of the algorithms and the necessary 
modifications to the program specifications 
have been completed for French and German 
titles, German keyword lists have been created 
and converted to machine-readable form. This 
phase of for.nat recognition will continue as an 
on-going effort of the marc Development 
Office. 

Results 

As work progressed on format recognition, 
it became evident that the success of this proj- 
ect depended heavily on standard cataloging 
practices in recording data and in using punc- 
tuation. Format recognition was originally 
designed to accept cataloging data based on 
the Anglo-Ameiicav Cataloging RnleSy but 
modifications were necessary to accommodate 
the catalog records created by the Shared 
Cataloging Program at the Library, which 
uses entries from various national bibliogra- 
phies and adapts them for LC printed cards. 
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Development of the International Standard 
Bibliographic Description (isnD) has broad 
implications for format recognition and the 
creation of machine-readable data bases. As a 
result of the International Meeting of Catalog- 
ing Experts sponsored by the International 
Federation of Library Associations and held in 
Copenhagen in August 1969» a working party 
was appointed to prepare a draft proposal for 
an International Standard Bibliographic De- 
scription. The objective was to formulate spec- 
ifications for bibliographic description, in- 
cluding a standard order of data elements, a 
minimum set of mandatory data elements, and 
standard punctuation. Use of the ISRD by 
national bibliographies and cataloging agen- 
cies would aid in the interpretation of catalog, 
ing data by humans and by format recognition 
programs. If all cataloging agencies were to 
prepare their entries according to the ISBD» for- 
mat recognition algorithms could then be more 
easily expanded to encompass foreign-language 
catalog records produced by the Shared Cata- 
loging Program as well as these originating at 
the Library. 



Notes 

• United States. Library of CoriKresa. Information Sys- 
tems OfTice. Format Rvcogniiion FroceHs for MARC Rec- 
ords; a Logical hcsign (Chicnjro, Information Science 
and Automation Division, American Library Associa- 
tion, 1970>. I V. (various pagings). 

A converted version has been written to operate under 
OS. 



^ Ilenriette Avrani, and others. *'MAnc ProKram Ro- 
Rcarch and Development: A Pro^rcas Report.'* Journal 
of Library Atttomniion, 2:242-249 (Dec. 1900). 

• A sequence number or site number is useii to ilistin- 
^uish variable t'clcis that have identical tugs. 

*• '^When processing n variety of Input messages, nne 
pro^fram may be *WAlTinK'— for n file action, for exam- 
ple — and at this time another tran.sactio>i wi-sbe.^ to use 
the projcram. This can cause problems if the prourani 
is written in such a way that it modifies itself while 
beinff executed, or stores lojfic information for latnr 
use in a location other than the unicjue niesaaKC*- refer- 
ence block. Programs to be used by multiple transac- 
tions in this way njust be carefully written so that no 
logic error can he caused by this. In particular, they 
must not modify themselves in such a way that, when 
control is taken away from them, another transaction 
can interfere with the modification. Programs which 
can be entered by multiple transactions without inter- 
ference are referred to as reentrant programs. If a 
program is not reentrant, it may be necessary to have 
more th:xn one copy of it in core at certain times in a 
multi-thread environment." From James Martiji's Dr- 
iign of Real-Time Computer Systrma (Englewood 
Ctiuj N.J., Prentice-HalU 1967), p. 148. 
"The Multiple Use M VRC System (mums) is being 
designed by the MARC Development Office as a data 
utility. Tbis " item will be capable of processing ma- 
cbine-readable records i-egardless of the source of the 
record, the content of ti^e record, and the master file 
on which the record will reside. It will also include 
all processing required to store, maintain, and retrieve 
records in both on-line and off-line modes. This project 
is still in the developmental stages. 

• RECON Working Task Force. Conversion of Retrospec- 
tive Catalog Records to Machine-Readable Form; a 
Study of the Feasibility of a National Bibliographic 
Service (Washington, Library of Congress, 1969), p. 64. 

Ibid., p. 63. 
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Chapter 5 
RECON Costs 



The RECON feasibility study projected costs 
per record for 20 possible technical alterna- 
tives for large-scale retrospective conversion.* 
The four most likely alternatives were: 1) 
direct-read optical charactei* readers, format 
recognition processing; 2) i.o editing, keying 
using a magnetic tape inscriler, format recog- 
nition processing; 3) partia' editing, keying 
using a magnetic tape inscriber. format recog- 
nition processing; and 4) full editing, keying 
using a magnetic tape inscriber.- The costs 
were calculated for the combined man 'ma- 
chine effort, which included staff salaries and 
overhead apportioned by function, i.e., project 
direction, editing, keying, proofing, catalog 
comparison, and quality control, as well as 
selection of cards from the record set or micro- 
filming of the cards and production of hard 
copy; the input device; and computer process- 
ing of the records. Derivation of these cost 
figures is described in the feasibility study.'* 

In the pilot project, only the second and 
fourth alternatives were tested. The first alter- 
native was impractical because no existing 
device affords the requisite OCR capability. The 
third alternative was unnecessary because for- 
mat recognition processing of unedited records 
proved to be entirely satisfactory. 

Computation of the actual cost per recon 
record was coniplic£.ted by two factor? : 

1) Since the recon Production Unit was used 
as a test facility for new devices and tech- 
niques, normal production was often inter- 
rupted and, therefore, production rates were 
low. 

2) Some recon records were keyed by a 
contractor. 



The variations in production rates during 
the lifetime of the project were such that a 
reliable cost per record could not be obtained 
by cost analysis based on total manpower coats 
and total input to the master data base. The 
use of contractual services resulted in unbal- 
anced workloads, with peak periods of editing, 
preparation of source documents for the con- 
tractor, proof* ling after records from the 
contractor Y ki been processed through the 
MARC systen etc. This fluctuation in the work- 
flow tended to create a bias in the cost calcula- 
tions ^vhen the analysis included production, 
figures for records processed entirely by LC 
staff as well as those edited and proofed by LC 
staff but keyed by the contractor. 

It was decided, therefore, to use the average 
cost of a current marc record as a basis for 
calculating a simulated recon cost. A great 
deal of experience has been gained at the Li- 
brary in the conversion of current catalog rec- 
ords to machine-readable form, and cost fig- 
ures have been maintained since the beginning 
of the MARC Distribution Service. These costs 
have been stable for more than a year and thus 
may be considered valid. RECON production 
and MARC production are functionally identical 
with the exceptior of selection of records from 
the record set and catalog comparison. The 
costs of record selection and catalog compari- 
son in the simulation of RECON costs were 
based on actual recon Pilot Project experience. 

Table 5-1 shows simulated manpower costs 
for the technical alternatives used in the pilot 
project. The simulated recon cost for any 
given alternative is about 15 percent higher 
than the comparable marc cost because of the 
need for record selection and catalog compari- 
son. It should also be noted that the latter costs 
would tend to increase as the conditions of 
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Tabic 5-1. Shnttlatcd costs of convcrti) . UECON rvcordn by twa tliffervfif methods ^ 



Full cditins: Format t'ccojrnition 



Function 


Percent 


Averaj?o 


PcM'cent 


A verajre 


Total 


100.0 




100.0 


$3.06 


Record selection 


G.l 


.21 


(>.9 


.21 


CatnloK comparison 


5.5 


.19 


r..2 


.19 


KditinK and revising 


11.6 


.40 






Typing 


9.2 


.32 


10.5 


.32 


Proofinjf 


1G.8 


.58 


18.9 


.58 


Verifying 


17.0 


.59 


19.3 


.59 


Other duties 


19.4 


.67 


21.9 


.67 


Leave and holidays 


14.4 


.50 


16.3 


.50 



^Derived from makc ronvrrtion runtu, July i97(}-Jnne 1971, 

retrospective conversion changed, that is, 
when smaller subsets of the total data base 
were selected or older records were processed. 

Machine costs have been omitted from the 
table because they do not lend themselves to 
accurate proration per record. For example, 
the total cost of the input device per record is 
affected by the number of devices sharing the 
converter and the numb'^r of characters keyed. 
The RECON feasibility study prorated the con- 
verter over 20 keying devices; in the present 
study, 10 keying devices were used as a basis 
for calculation (see Chapter 7). The feasibility 
study also assumed 325 characters per record 
for unedited records and 500 characters per 
record for fully edited records.'* Production 
experience showed an average of 398 charac- 
ters per record for fully edited records. 

On the basis of the above data, the co.st per 
input device for fully edited records was deter- 
mined to be $.07 per record, a figure which 
compares favorably with the prediction of 
$.063 in the feasibility study. Experience in 
the project was insufficient to permit an accu- 
rate evaluation of the projected cost for the 
input device of $.041 per record for typing 
unedited records, but an increase in the num- 
ber of records keyed was noted in spite of the 
requirement for greater typing accuracy in 
format recognition. 

Cost estimates for the hardware and soft- 
ware configuration required for a national 
bibliographic service remain valid since noth- 
ing has been found to contradict the assump- 
tions made.-* Present hardware costs, com- 
pared to those given in the feasibility study, 
could influence total costs, but there is nothing 



to be gained by updating one estimate with 
another at this time. 

Costs obtained during the RECON Pilot Proj- 
ect cannot be compared on a one-to-one basis 
with projected figures in the feasibilit" study 
for several reasons : 

1) The projected costs were based on the expe- 
rience of the MARC Pilot Project and the MARC 
Distribution Service in the earliest days of its 
implementation. 

2) Government salaries have been upgraded 
several times since the projected costs were 
calculated. 

3) RECON production experience dictated a 
modification of the techniques postulated in 
the RECON feasibility study. 

The study assumed procedures that involved 
keying, input processing, sorting of records, 
production of proof sheets, comparison of 
proofsheets with records arranged by main 
entry to records in the Official Catalog, and 
keying of corrections to records requiring 
changes. During actual production, it was 
found that the process of catalog comparison 
and updating of the record was facilitated by 
using the input worksheet (consisting of a 
copy of the printed card) rather than the com- 
puter-produced proof sheet. This procedure 
eliminated the necessity to sort the records by 
main entry before producing the proofsheets. 
Use of the source document for catalog com- 
parison also allowed changes to be made to the 
record before the first keying, thus eliminating 
additional keying and proofing. 
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The feasibility study also assumed the exist- 
ence of quality control procedures, consisting 
of an inspection of 50 percent of the total 
number of records after the first proofing. A 
first sampling of 10 percent of all converted 
records would result in acceptance of 55 per- 
cent of the batches. This 10 percent sample 
plus total inspection of 4b percent of the re- 
maining 90 percent would provide 50.5 percent 
inspection." Since production rates had not 
reached the proportions assumed in the feasi- 
bility study, the population was not large 
enough to huve confidence in the proposed 
sampling technique. The majority of RECON 
records were checked by a verifier in addition 
to the initial proofing, and although similar 
inspection procedures were tested toward the 
end of the pilot project, these were initiated 
as part of overall quality control for the marc 
Editorial Office and were not reflected in 
RECON costs. 

1) The RECON fea.sibility report predicted that 
format recognition would add an incremental 
cost to the total cost of conversion ; however, 
format recognition resulted in the savings of 
21/4 seconds of machine time per record. 

5) The RECON feasibility study suggested 
selecting the records for conversion (e.g., all 



English-language records from 1960 to date) 
from the Card Division record set, microfilm- 
ing the records, and reconstituting the record 
set. 

In Chapter 8 of this report, it is recommended 
that the part of the record set containing the 
subset of records chosen for conversion be 
microfiiined (e.g., all records, English- 
language and others, from 1960 to date) and 
the records for English-language monographs 
be selected from the microfilm copy. Since rec- 
ords for the RECON Pilot Project were limited 
to English-language records cataloged in 1968 
or 1969, they were selected from cards in stock 
rather than from the record set. Since stock 
cards did not have to be replaced, microfilming 
was not necessai^y during the pilot project. 

Notes 

^ REx:ON Working Task Force. Conversiov of Retrospec- 
tive Catalog Records to Machine-Readable Form; a 
Study of the Feasibility of a National Bibliographic 
Service (Washington, Library of Congress, 1969), p. 
224-22G. 
2 IbUl, p. 98-99. 
^ Ibid., p. 39-96. 
* Ibid,, p. 59. 

■•/6irf., p. 68-73; 183-223. 
"/ftirf., p. 83-85. 
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Chapter 6 



Research Titles Study 



Background 

Since the production operations of the RECON 
Pilot Project were limited to Eiiglish-language 
monograph records with 1968, 1969, or 7 aeries 
card numbers, it was recognized that many 
problems in converting retrospective record:, 
would not be revealed except by u research 
effort. For this reason, a project was under- 
taken to identify and analyze 5,000 research 
titles, consisting of records for older English- 
language monographs and foreign-language 
monographs in roman alphabets. These records 
would be studied for problems concerning; 1) 
earlier cataloging rules which caused certain 
elements to be omitted from the record or 
transcribed in a different style; 2) printed 
card formats which placed elements in differ- 
ent locations; L^) elements in languages unfa- 
miliar to the editor, such as foreign place 
names; 4) cataloging originating in different 
countries under the Shared Cataloging Pro- 
gram ; and 5) expansion of format recognition 
to cover these kinds of records. 

Two sources of research records were ini- 
tially considered: 1) the project to compile a 
book catalog of the Main Reading Room refer- 
ence collection; and 2) the popular titles of 
the Card Division. Both so^Jrces were studied 
for the degree of overlap of titles jind suitabil- 
ity for RECON purposes. 

The characteristics of the Main Reading 
Room reference collection were studied first. 
To compile the book catalog of this collection, 
printed cards were obtained from Card Divi- 
sion stock for conversion to MARC. The cards 
represent a wide range of material cataloged 
from 1899 to the present. Approximately one- 
fourth to one-third of the estimated 14,000 
titles are serials. Most of the roman-alphsbet 



languages currently processed at the Library 
are inclided, as well as the more common non- 
roman-alphabet languages such as Russian, 
Japanese, and Hebrew. The catalog contains a 
number c '. "difficult" titles, such as encyclope- 
dias or d ctionaries, which present a variety of 
catalogii g and conversion problems. 

The popular titles from Lhe Card Division 
were studied next. As part of Phase I of the 
Card Division Mechanization Project, a record 
was kept of the number of orders received for 
each LC printed card. A printout which con- 
tained 39,148 card numbers for titles with 10 
or more orders was produced. A sampling tech- 
nique was developed to determine the percen- 
tage of overlap between this list and the titles 
in the Main Reading Room reference collection. 

The estimated number of matches indicated 
that there was not enough overlap between the 
Main Reading Room catalog and the Card Di- 
vis'on popular title.« consider a selection of 
tit es that would .-erve the needs of these proj- 
ec 3 as well as ihose of RECON. It was decided 
that records from the Main Reading Room 
catalog would be more Ljitable as the first 
source from which to choose RECON research 
titles. 

Selection 

To obtain 5,000 records for the research 
titles study, approximately 1,800 cards were 
selected from the Main Reading Room catalog. 
The remaining 3,200 record?, consisting of 
current foreign-language cataloging, were 
selected ;rom printed card.^ drawn from Card 
Division st^ck for RECON production efforts. 
Emphasis was placed on foreign-language rec- 
ords in French and German, since titles in 
these two languages constitute a large propor- 
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tion of the Library*s foreign-language catalog- 
ing. Other roman-alphabet languages were also 
represented. 

An analysis of the problems that would be 
encountered in converting the research titles 
showed some similarity between the cataloging 
of older records (pre-1949) and of current 
foreign-language records based on shared cata- 
loging copy. Certain stylistic conventions, e.g., 
the use of ellipses or the transcription of im- 
print statements^ were similar for both kinds 
of material. It was felt that a thorough knowl- 
edge of the 1908 ALA Catalofjing Rides would 
be necessary in order to interpret correctly 
the data on the older printed cards during a 
corversion project. 

Editors in the recon Production Unit have 
foind that assignment of content designators 
for retrospective records, ?ven those cataloged 
during 1968 or 1969, require a considerable 
amount of interpretation. For pre-1949 rec- 
ords, the problem becomes more acute because 
the editors must attempt to interpolate the 
procedures and techniques for current mate- 
ria! to older records. It is likeSy that higher 



level personnel would be required to process 
these record.4 since in some instances ihe 
changes would be similar to recataloging the 
entire record. 

Different cat* loging rules and printing con- 
ventions created even more serious problems 
for the expansion of format recognition to 
cover older catalog records and records based 
on shared cataloging copy. Each national bib- 
liography, from which shared cataloging copy 
is derived, has its own rules and style of cata- 
loging. For works in German, for example, 
punctuation, .style of cataloging, and printing 
conventions may vary among entries from 
West German, East German, Austrian, and 
Swiss bibliographies^ all of which may also 
differ from LC practices. The analysis also pro- 
vided the basis for expansion of format recog 
nition to include foreign languages (see Chap- 
ter 4). 

Foreign Language Test 

DeciFiona were subsequently reached on how 
to handle problems encountered in the analysis 
of the research titles (see Appendix I). These 



Table 6-U Production and error rates hi the foreign-Jangimgc editing test 
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4 days 
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German 


199 


2.5 days 
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,65 


13.0 


Editor no. 
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German 


200 


4 days 
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Eng'lish-lanpruagre records 








Editor no. 


1 ' 


English 


383 




Estim. 
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hr. 




56 




3.0 


Editor no. 


2 ^ 


Enprlish 
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11.3 per 
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or 87^:^. 
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Editor no. 




Enprlish 


139 
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hr. 
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'Editor no. 1 — No firm fifcures availabJ*?. IDaiiy ediMn*- statistics also inoJuileiJ proof?." and cntaJojr comparison). 
* Editor no. 2 — Flpnire taken from daily atatisticH r.vcrRKed nvvr n si x month period. 
'Editor no. 3 — Fiffure taken from daily stntisticn averaged over a ai x month period. 
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decisions were incorporated into editing in- 
structions for the MARC Editorial Office aa 
well as for a foreign-lan^uagi> editing experi- 
ment. The purpose of this experiment was to 
determine if experienced editors without much 
knowledge of foreign InngUc ges could maintain 
acceptable levels of accuracy and rates of pro- 
duction when editing foreign-language records. 

Three trained editors edited a total of 1,180 
French- and German-monograph records, or 
approximately 196 records per editor in each 
]angu<\ge. Two of the editors had taken some 
college level courses in a foreign language. 

The test res^ults revealed an error rate of 
approximately 50 percent, i.e., about 50 per- 
cent of the records contained editing errors. 
Production rates for the three editors and a 
comparison with their performance on Eng- 
lish-language records are given in Table 6-1. 
The number of error-free records varied from 
45 to 53 percent for French and 40 to 56 per- 
cent for German, as compared with 87 to 94 
percent for English. 

The editors made an average of 12,6 to 17.5 
errors per batch of 20 records for French and 
13.0 to 20,6 for German, compared with 1.6 to 
3.0 errors for English. Since a standard of 2.5 
errors per batch has been established by the 
MARC Editorial Office for trained aditors, con- 
siderable improvement must be made before 
foreign-language records are converted on a 
production basis. 

The high f rror rate for editing of foreign- 



language records was anticipated since it was 
known that approximately half of the editing 
effort is directly dependent upon an ability to 
read the words that make up the record. The 
remaining activities involve the identification 
of various data elements by their location on 
the printed card and are unrelated to language 
proficiency. 

The majority of the errors took the form of 
wrong subfield codes, erroneous placement of 
delimiters, and incorrect fixed field codes (see 
Table 6-2). Approximately one-third of the er- 
rors in subfield codes and delimiters appeared 
in the title field, where knowledge of the lan- 
guage is essential in order to identify the data 
elements correctly. The majority of the tagging 
errors could have been avoided by consulting the 
numo authority records in the Ofliicial Catalog. 

Although all of tiie editors had had some 
training in French ai^H none in German, their 
editing speed for French was only slightly 
higher than that for German. Since the editors 
began the experiment by editing French rec- 
ords and had thus gained additional experience 
before w^orking with German, it was decided 
to determine the effects of such experience on 
the results. The number of errors made by 
each editor in each batch of records was tallied 
to see if any improvement had taken place 
during the entire course of the test. No appre- 
ciable improvement was noted for any of the 
editors. It is doubtful that much improvement 
would be shown unless exte; sive training in 



Tabic 6-2. Location and number of errors in fo^ cign-language editing test 
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3 
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7 
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60 


2 


11 


15 


17 
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8 
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13 
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21 
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10 
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45 
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11 


11 
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14 


8 


23 


e 
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*7 




*8 
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German 


16 


16 


24 


1 


18 


11 


34 




21 



•Error asHociatea with title field. 
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the editing of foreign-language records were 
nonductecl. 

The test demonstrated that editors who are 
not fuily knowledgeable in a foreign language 
cannot accurately edit records in that lan- 
guage withoi t ar.sistance. The editor's work 
must be carefully revised by someone with a 
reading knowledge of the language as well as 
an understanding of editing procedures. If the 
editor's work requires complete revision, actual 
editing time is of course drastirally increased. 
During the teat, it took the reviser almost 
twice as much time to correct the test records 
as it had taken the editors to edit them. Hav- 



ing language specialists edit such critical por- 
tionci of the record as the title field and fixed 
fields would involve teaching them the editing 
procedures, and a staff of regular editors 
would still have to be maintained to edit the 
remaining portions of the foreign-language 
records. It was concluded that the more desir- 
able alternative would be to have editors who 
are proficient in the language of the records 
to be edited. Even with the advent of format 
recognition processing for foreign-language 
records, the editors would still have to deter- 
mine if the elements had been correctly identi- 
fied by the computer. 
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Chapter 7 
Input Devices 



Since July 1969, the Library of Congress 
has used the IBM Magnetic Tape Selectric 
Typewriter (mtst) as the input device for 
MARC data. The current marc system is an off- 
line system. Experience at the Library has 
indicated that original input of bibliographic 
data does not call for an on-line system but 
that correction and verification procedures 
would be greatly enhanced by on-line capabil- 
ity. Keeping such requirements in mind and 
seeking a best method for conversion of a large 
retrospective file, a state-of-the-art review of 
input devices was therefore conducted to ac- 
complish the following: 1) compare new de- 
vices with the MTST and evaluate their relative 
efficiency for use in the LC environment; 2) 
determine if the development of direct-read 
optical character readers had progressed to 
the point that such equipment could be used to 
scan LC printed cards 3) select a terminal 
device that would meet LC requirements for 
on-line correction and verification procedures; 
and 4) compare the use of a mini-computer 
with the present method of input (off-line to 
System 360) to determine if there were any 
technical or cost advantages to be gained. 

Since the input of data is still the slowest 
component of a computer system, and because 
there is a growing demand for larger charac- 
ter sets, a great deal of emphasis has been 
given by hardware manufacturers in the past 
f 3W years to the development of more efficient 
and sophisticated devices. Naturally, any study 
on a subject as dynamic as input devices is out 
of date and incomplete at any point in time. 
Although the investigation continued through- 
out the life of the recon Pilot Project, it is 
recognized that an ongoing study is necessary 
and that devices may exist that are not de- 



scribed in this report because 100 percent cov- 
erage was not possible. 

The investigation included an in-depth liter- 
ature search, inquiries to various manufactur- 
ers, attendance at .neetings, and testing of 
selected equipment, in some cases in an opera- 
tional mode by the Keyboarding Unit of the 
MARC Editorial Office. 

Keyboard Devices 

In order for a device to compete with the 
MTST in the context of rfcon evaluation, it had 
to meet the Library's keying require/nents 
(easy accommodation of variable record 
lengths and the expanded character set) and 
either cost less than the mtst or increase pro- 
duction substantially to offset any increase in 
price. The equipment monitored, which in- 
cluded both off-line and on-line devices, has 
been categorized for this report as follows : 

1) Key to magnetic tape 

2) Key to computer-compatible magnetic' 
tape 

3) Key to disk 

4) Key to cassette 
Key-to-magnetic-tape systems consist of a 

number of input devices under centralized elec- 
tronic control that acts as a routing and record- 
ing device. The control component may have 
the sophistication of a mini-computer with the 
facility to perform many functions such as 
editing, formatting, etc. In either case, one 
characteristic of this system is its ability to 
handle a large number of input devices simul- 
taneously. The devices categorized as key- 
to-computer-compatible-magnetic-tape systems 
may either stand alone or share a centralized 
control device, called a 'Spooler," which records 
the information from a number of inr ^t de- 
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vices onto one ma^metic tape. A charact^riatic 
of the pooler is that it handles fewer input 
devices simultaneously than the key-to- 
magnctic-tape system. The key-to-disk sys- 
tem operates in the same way as the key- 
to-magnetic-tape system. Devices in the 
key-to-cassette category require a converter to 
go from cassette to computer-compatible mag- 
netic tape. 

Ttible 7-1. compiled in May 1970, summar- 
izes the characteristics of the devices moni- 
tored. Although prices were considered in the 
actual analysis, they have been omitted from 
the table to avoid the confusion that might be 
caused by out-of-date information. 

The majority of devices available today do 
not satisfy the requii'omenta for input of bib- 
liographic data, the principal limitation being 
in the number of available characters. Among 
those evaluated, the Keymatic Magnetic Tape 
Unit appeared to offer enough potential ad- 
vantage, despite higher cost, to warrant fur- 
ther exploration, 

Keymatic Data System Model 1093 

The primary attraction of the Keymatic is 
its ability to encode 256 unique characters 
withoul: the use of an escape code. The layout 
of the keyboard is designed according to the 
user's specifications. The MARC character set, 
consisting of 175 graphic characters, could be 
assigned keys in clusters. One cluster might 
include special characters and diacritical 
marks, for example, and another cluster might 
contain uppercase and lowercase alphabetic 
characters. Common ''words'' such as MARC 
tags '^ould be assigned to single keys (called 
expandables) and translated to their proper 
value by software, thus reducing the amount 
of stroking required. 

In addition to the flexibility provided by the 
256 characters on the keyboard, the nachine 
offers the following advantages; 1) r;ata are 
recordec] directly on compuier-compatiole mag- 
netic tape; 2) correction procedures are built 
into the device, i.e,, the ability to delete a 
character, word, sentence, or entire record ; 
and 3) the single character display screen 
obviates the necessity for hard copy. It is often 
claimed that hard copy output is scanned by 
the typist unintentiontilly, to the detriment of 
typing speed. 

The keyboard of the machine tested was 
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designed specifically for thr Library's require- 
ments. Four separate keyboards contained 184 
keys of which 103 had uppercase and lower- 
case capability and 81 had only a single case. 
Although 287 codes could be represented, only 
256 were used, with some keys representing 
the same codes. The codes were divided into 
the following categories: 1) 94 were used as 
expandables and assigned to those MARC tags 
and correction and modification commands 
that are used most frequently; 2) 10 were 
used as machine function codes; 3) 150 were 
assigned unique values in the MARC character 
set; and 4) two were left unused. 

The keys on the four keyboards were as- 
signed values so that the most freqv.ently used 
keys were located in a strong stroke area. To 
keep additional training of the typists to a 
minimum, the main character keyboard was 
designed to correspond closely to that of the 
MTST. Practice was required only for the ex- 
pandable keys and some of the less frequently 
used special characters. The keyboard layouts 
are shown in figures 7-1, 7-2, 7-3, and 7-4, The 
program supplied by the manufacturer was 
modified for code conversion and output for- 
mat acceptable to the MARC system. 

The two typists selected to participate in the 
test we;^ both experienced MARC production 
typists. Each typist was given individual in- 
struction on the machine and spent approxi- 
mately seven days over a three-week period 
practicing. During the actual test period, the 
typists spent two weeks working full time on 
the machine. Their production rates increased 
from 6-7 record? per hour at the beginning of 
the practice period to 11-12 records per hour 
at the end of the test period. 

Each typist reported on problems that arose 
during the evaluation. One complication was 
the hesitation when the typist had to decide 
v/hether lo use an expandable key or actually 
type the data, character by character. If she 
chose the former, the expandable key had to 
be found. The large number of tags and their 
different combination.^ caused some confusion. 
The opinion of both typists concerning the 
keyboard arrangement was that they would 
rather type the tags character by character 
than search for the expandable key. More expe- 
rience on the device might jliminate this 
problem. 

The absence of hard copy, although consid- 



Table 7-1. CharacieriaticH of keyboard devices, May 1970 
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erod beneficial to typing speed, proved to be a 
handicap for this te.st. Under current input 
procedures when a typist thinks that she has 
made a typing error, she checks the hard copy 
to verify th^t a mistake has actually been 
made before taking corrective action. The ab- 
sence of hard copy precluded such verification, 
and the typists reported that this detracted 
from their efficiency. 

The Keymatic model used for the test rents 
for $768.25 per month (July 1970 pricelist). 
It is a fully equipped model with several op- 
tions not required for the MARC system so that 
a less expensive model could be tused. Keymatic 



does have a 24-month lease plan under which 
the basic machine could be rented for $368.00 
per month. This would be an increase of 
$258.00 per month per machine over the cost 
of the current method of input. 

Average production rates were computed for 
the same two typists for the Keymatic and the 
MTST. Although the sj:me records were not 
actually typed on the MTST, experience with 
production and error rates on that device has 
been extensive so that it was considered valid 
to use existing MTST figures for the 
comparison. 

In computing the cost per record, the hourly 
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Figure ?-2. Keywatic viaster keyboard. 



cost per machine was calculated by dividing 
the cost per machine by 132 working hours. 
The 24-month leasing price of $368.00 per 
month was used for the Keymatic, resulting in 
a machine cost per hour of $2.79. The mt«t 
rental cost is $110.00 per month, resultinf? in an 
hourly cost of $.83. A converter is required to 
translate MTST output to computer-compatible 
tape, adding an incremental cost to each input 
device. The monthly rental cost of the conver- 
ter is $260.00. For this report, the total num- 
ber of MTST devices producing input for the 
converter ^the Library has 10 MTST devices, 
including the six used for MARC/recon) was 
used as a base figure. Addition of the prorated 
converter cost of $.20 per hour to the mtst 
cost of $.83 resulted in a total hourly cost of 
$1.03 for the MTST. On the basis of 12.1 rec- 
ords per hour typed on the Keymatic and 14.6 
records per hour on the mtst, the cost per 
record is $.23 for the Keymatic and $.07 for 
the MTST. 



Since the test indicated that the Keymatic, 
used in the LC environment, did not increase 
production rate-^, no savings in cost were dem- 
onstrated. The complexity of the data to be 
typed and the construction and quality of the 
worksheet used at the Library impose severe 
constraints on all machines. In order to make 
a fair comparison between the Keymatic and 
the MTST, the manuscript card was used for 
the test rather than the printed card. Repro- 
duction of the manuscript card on the marc/ 
RECON worksheet results in a source document 
that is difficult to work with, owing to the loss 
of legibility during the copying process, the 
position of tags in relation to content, and 
the combination of typed and handwritten 
data inserted by the catalogers. 

Keymatic does have a machine, the Model 
K-103, which has an 80-character visual dis- 
play option which might correct one of the 
objections raised by the typists, i.e., the ab- 
sence of hard copy. However, the other prob- 
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lems described above still remain, and in addi- 
tion, the K-103 reqLiires the uae of a converter 
as does the mtst. 

Comparison of MTST and OCR Methods 

Keying of RECON records in the 1969 and 7 
series was performed by a contractor usin^ an 
IBM Selectric typewriter e(iuipped with an 
OCR font. The resulting!: hai,d copy was fed 
through a Farrington optical character reader. 
The contractor monitored and reported the 
production rates for his equipment, and these 
were compared with corresponding data for 
the MTST's used in the Library. 

The cost of the typewriter with the OCR font 
{$pOO.OO) was amortized over 40 months fo a 
monthly cost of $12.50. If the typewriter were 
used 132 hours a month, the hourly cost would 
be $.10. The contractor reported a typing rate 
of 12 records per hour or $.0i per record for 
the typewriter. The service bureau rental cost 
for the contractor's Farrington scunner, which 
can read 10,000 lines per hour (or 556 records 
averaging 18 lines in length) is $50.00 per 
hour (or $.09 per record). The contractor's 
total equipment cost per record is $.01 for the 
typewriter with the OCR font, plus $.09 for the 
scanner, or a total of $.10. This cost is quite 
close CO that for the mtst equipment of $.07 
per record. 

Even if typing with OCR turned out to be 
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less expensive than with the MTST, this method 
would probably not be satisfactory. The Far- 
rington scanner is capable of reading only 
uppercase. Since bibliographic data contain 
fewer uppercase than lowercase characters, a 
shift character was typed in front of each 
uppercase character. The resulting hard copy 
was difT^cult to read and contained many typ- 
ing err<)rs. In order to attain a degree of accu- 
racy comparable to that of the Libra)'y\s typ- 
ists» the contractor found it necessary to proof 
and correct all records before returning them 
to the Library for further processings Records 
typed on the mtst at the Library are not 
proofed before being processed by the compu- 
ter. Proofing of uncorrected OCR records might 
decrease the editor's production to a point that 
the higher manpower coats would more than 
offset any savings on equipment. OCR equip- 
ment with uppercase and lowercase capability 
is now becoming available, but it must be as- 
sumed that the rental on such equipment will 
be higher. 

Direct-Read Optical Character Reader!^ 

An obvious advantage f( r the use of a direct- 
read optical character reader is elimination of 
the need for manual keying of the original 
input. With format recognition a proven tech- 
nique, the use of such a device has even greater 
possibilities. Bibliographic data could be read, 
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edited, formatted U'^cordin^ to makc format 
specifications, and c utput as a proofahoot with 
a minimum of human intervention. 

There are two typos of optical character 
readers available; 1) document readers, which 
read only a limited number of lines per docu- 
ment, e.p:., a credit ''ard; and 2) pa^e readers, 
which are capable of reading an entire pa^e. 
Because of the input requirements of biblio- 
graphic data, document readers were not con- 
sidered in this study. 

An OCR device accomplishes its recognition 
by flood in^r the document with li^^ht and ana- 
lyzing the reflection. Lipfht patterns are cap- 
tured in a photomultiplier and converted into 
an electronic sifrnaK In ^reneral, these signals 
are matched apainst either memory or lojric 
circuitry, and a corresponding^ code configura- 
tion is output onto the desired medium, e.g., 
disk or tape. Each manufacturer has specific 
requirements for the type of paper used and 
style of printing recognized. 

Machines were considered as possible candi- 
dates if they were capable of proce.ssing upper- 
case and lowercase alphabetic as well as nu- 
meric characters, standard punctuation, and 
some special symbols. The special characters 
available, which vary among the manufactur- 
ers, can be separated into two categories: edit 
function characters, and those characters that 
cannot be categorized as alphabetic, numeric, 
or standard function characters, e.g., or 

Equipment produced by the following man- 
ufacturers was considered for the initial 
evaluation: 

Information International 
Mergenthaler-Linotype 
Control Data Corporation — 915 Page 
Reader 

IBM Corporation — Model 1287 
Farrington— Models 3050 and 3030 
Scan-Data— Models 100 300, 200 
Philco-Ford, Inc. — General Purpose 
Reader 

Recognition Equipment, Inc. — Retina 
CompuScar — Model 370 

Although Mergenthaler-Linotype and Infor- 
mation International did not have any device 
commercially available at the time, eacn did 
have a machine in production. Two of the com- 



panies subsequently gave up their efforts to- 
ward direct-reading of the LC printed card 
because of the complexity of the data on the 
card. 

Investigation of the remaining devices 
revealed that all except the Compubcan and 
Scan-Data, required keying of dalh before 
reading with the scanner. Some manufacturers 
claimed that their devices had the potential to 
read the LC printed card directly but required 
substantial funding for hardware and or soft- 
ware development that was out of range for 
the RECON Pilot Project. As a result, tests 
could only be conducted on the CompuScan 
and Scan-Data. 

CompuScan Optical Character Reader 

The Model 370 CompuScan is a computer- 
directed flying-spot .scanner which matches the 
scanned portions of a character with an elec- 
tronic character held in the core memory of 
the computer. The record set would have to 
be microfilmed according to the specifications 
required by the scanner. Since the scanner 
operates with negative film, a very dark back- 
ground with a very clear, white image is 
m jessary. 

The manufacturer examined a sample of LC 
printed cards selected at random covering a 
10-year period and concluded that although the 
hardware would be suflicient to read the rec- 
ord set optically, a rather significant software 
efl'ort would be necessary. 

The LC record set is not entirely composed 
of *'mint" cards (cards printed from the metal 
of the original Linotype composition) but in- 
stead is a mixture of originals and ijprints of 
the original. When the stock of the original 
prijiting is close to depletion, the card is re- 
printed by photographing the card and making 
duplicates by a photo-offset process. As this 
cycle is repeated, the card for any one title 
could be several generations removed from the 
original. In some instances, a micro.scopic ex- 
amination of the cards seemed to indicate that 
the matrices used on the Linotype were worn. 
Thus, what might appear as the same charac- 
ter to the naked eye would present a different 
pattern configuration to the scanner. 

The coarseness of the surface of the card 
itself may cause variations in the same char- 
acter. To achieve the archival standards re- 
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quired by libraries, LC curda are printed on 
high-rag-content stock. The rough surface of 
the caid does not affect readability for a hu- 
man but may ^ause variations in a given char- 
acter. Softwa/e mu.st be written to handle 
these variant characters and to match them 
with characters in fie core memory of the 
scanner. 

Another significant problem in dealing with 
LC cards concerns touching characters, a con- 
nection between what are intended to be dis- 
tinct characters but read by the scanner as 
one. For example, if a lowercase '*n" were 
next to a lowercase **t" and the cross bar on 
the **t" touched the "n/' the scanner would 
consider the combination of the '*n" and the 
"t" as one character. Another module of soft- 
ware is required to set an allowable limit for 
reading a single character so that the machine 
will recognize touching characters as separate 
entities. When this limit is exceeded, the pattern 
must be divided and each section matched 
against a single character pattern held in core. 
A machine decision must then be made to 
identify the two patterns. 

When variant or touching characters occur, 
the output on magnetic tape is flagged for later 
spot checking. In this way, the scanner can 
continue to operate at throughput speeds with- 
out human intervention. The resultant mag- 
netic tape would serve as input to the format 
recognition program to refornaat the scanner's 
output into the marc format. It has been esti- 
mated that the throughput speed of the 
CompuScan would be in the vicinity of 1,800 
cards per hour. 

The manufacturer offered to use originally 
printed LC cards to test the device without 
expending funds for software modifications. 
Twenty-five letterpress LC cards representing 
English language titles and containing no dia- 
critical marks were sent to the firm for input. 
Since existing CompuScan software n^as used 
for the test, only tlie portion of the LC card 
containing fonts already built into the existing 
configuration could be used. All data except the 
title paragraph (title through imprint) were 
blocked out before microfilming for subsequent 
scanning. 

Operator intervention was required for 1 to 
25 percent of the characters on each card. In 
addition to the problems described above, fine 
lines in certain characters caused a misread- 



ing of the character by the scanner^ the letter 
*'e," for example, being interpreted as the let- 
ter "c.** CompuScan felt that this problem 
might be resolved by increasing the size of the 
comparison matrix of the hardware. Another 
problem encountered was that a period was 
generated in the middle of a word due to the 
coarseness of the card stock. 

Scan-Data 

Dissly Systems has effectively modified the 
Scan-Data optical character reader, via soft- 
ware, to read 55 different type fonts. The 
various fonts are recognized by a *'best coin- 
pare" technique using six stored fonts to match 
against the remaining 49. The manufacturer 
claims that direct-read is accomplished with 
accuracy levels of approximately 95 percent. 
Errors are flagged during ^ proofing cycle 
after the record is in machine-readable form 
and corrected'ip the machine data base. 

The Scan-Data equipment does not have a 
transport for a 3 x 5 document, and the LC 
cards must therefore be attached to an 8 x 14 
sheet for sci^nning. Since the manufacturer 
cannot return these cards to the Library, they 
would have to be taken from stock rather than 
from the record set. This constraint places 
severe limitations on the application of the 
Scan-Data since many cards are out of stock 
and those that are in stock may be second- or 
third-generation cards which, as indicated 
above, are not ideal candidates for direct-read 
scanning. 

Fifty good quality cards were submitted to 
Dissly Systems for a test run. Five of the 50 
were returned to the Library with an associ- 
ated printout. The results were not encourag- 
ing: many lines of text were missed and many 
characters misread. It should be noted that the 
experiment was run without any modifications 
to the existing machine and software. 

Cathode Ray Tubes 

Cathode ray tubes were not considered for 
original input but rather as an aid for the 
MARC correction and verification cycles. The 
CRT devices essentially fall into two categories: 
graphic terminals and alphanumeric termi- 
nals. A graphic terminal, best described as a 
line drawing unit, is used primarily in applica- 
tions involving the drawing of plans, schemat- 
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ica, etc., with a minimum of associated text. 
An alphanumeric terminal is similar to a type- 
writer in that it can display alphanumeric and 
special characters. 

All CRT devices have the same basic operat- 
ing components: screen, memory, keyboard, 
character generator, and a set of electronics 
that tie the components tog'otlier in a unit ca- 
pable of communication. Various options nre 
available, including additional peripheral de- 
vices, expanded character sets, and editing 
features. Some devices are linked to mini- 
computers that allow data manipulation at the 
terminal. 

CRT specifications were developed for the LC 
MARC character set, both for keying and dis- 
play, and for editing functions which would 
allow insertion and deletion of characters, 
words, or lines. The device would have to com- 
municate with the LC hardware configuration 
and be adaptable to other major manufactur- 
ers' hardware in case of changes in the con- 
figuration in the ..uure. Minimum require- 
ments were established for viewing areas, 
character size, character capacity of the 
screen, etc. 

Equipment was evaluated by matching the 
capabilities of a device against the specifica- 
i^ions. Few devices met the requirements, with 
the primary limitation being the character set. 
Most of the displays have a 64-character capa- 
bility, some offer 96 characters as an option, 
and a few are capable of expansion to 128 
characters. 

The two devices that conformed most closely 
to LC specifications were studied in greater 
detail. The first, the Irascopp Model LTE, built 
by Spiras Systems, Inc., was developed in con- 
junction with the Ohio College Library Center, 
The limitation of the Irascope was the size of 
the character set. The device permits keying 
and transmission of 155 characters, but only 
128 characters can be displayed. 

The second device, the PDS-l, is manufac- 
tured by the IMLAC Corporation and has both 
graphic and alphanumeric capabilities. The 
standard equipment includes a 4K mini- 
processor; through the use of software, char- 
acters of any shape nan be displayed. There 
are 19.6 characters that can be keyed, and the 
res'ultant 196 unique codes can be translated 
into any 196 shapes for display. Although pre- 
cise figures are not yet available, it was deter- 



mined that the cost of the- PDS-l is higher than 
that of the Irascope LTE. The final decision was 
to acquire the Irascope for use in marc correc- 
tion cycles. 

Mini-Computers 

The Library also conducted an investigation 
to determine the feasibility and desirability of 
using mini-computer on-line for MARC input 
funetiuns (both original input and correc- 
tions). This study was performed with con- 
tractual support. Benefits which might result 
from converting MARC to an on-line system for 
input included, in addition to increased 
productivity : 

1) Improved timeliness of data released to the 
MARC Distribi'.tion Service. 

2) Savings in IBM 360 computer time required 
to process MARC records from the point of in- 
put to the stage at which they are declared 
error free and transferred to the master data 
base. Assumption of various input functions 
by the mini-computer would relieve the main 
computer of these functions, and the 360 would 
thus be required only to process verified rec- 
ords on the master data base. 

This survey, conducted in late 1969, was not 
intended to be all inclusive. Time and funding 
were limited, and since the mini-computer field 
is expanding rapidly, it was not possible to 
have surveyed the totality at any given cut-off 
point. Inquiries were cnrected to seven firms 
known to manufactu* ?^and market mini- 
computers. Six of the nhrr?^j*respTmTlett^ 
descriptions of devicf i that were considered 
potentially applicable to MARC operations. 
These included the Burroughs TC-500, Digital 
Equipment Corporation PDP-8/'I, Honeywell 
DDP-516, IBM 180O, Interdata Model 4, and XDS 
Sigma 3. Of these, the DEC PDP-8 'I and the 
Honeywell ddp-516 had the greatest potential 
for meeting the requirements for MARC input 
processing. 

Most manufacturers oflfer progra:nming sup- 
port on an individually negotiated contract 
basis. All of the mini-computer manufacturers 
covered in this study supplied an assembler as 
well as debugging and editing routines. Sev- 
eral provided a FORTRAN compiler and an oper- 
ating system ; however, the minimum cost of a 
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sj'.stem that supports compilers and operating 
system storage is approximately $10,000. In 
addition, the mini-computer may include only 
a few standard features in its basic system, 
and addition of the optional features necessary 
for a Riven appliration can make the price sub- 
stantially hiffher than that quoted for the basic 
system. 

On the basis of this study, it was concluded 
that althouj^h the use of a mini-computer is 
technically feasible in performing ma'^C input 
functions, addition of a mini-computer to the 
present LC hardware configuration would not 
result in either technological or economic 
gains. Specifically, the processing load re- 
moved from the 360 computer by the mini- 
computer would not be sufficient to justify the 
added cost imposed by the latter system. 

The experience of proce.ssing MARC records 
in the LC environment during the past several 
years has indicated that there is no gain in 
original input on-line but a great deal to gain 
with on-line correction procedures. This fact 
weakens considerably the argument in favor of 
Jevoting a separate mini-computer subsystem 
to original input and correction procedure.s, 
since the correction aspect alone represents a 
much smaller load factor. 



In addition, the procedures under considera- 
tion include corrections to both the working 
files and the marc master data base. Smce the 
Library's requirements for handling large files 
and sophisticated acce.ss structures are beyond 
the capabilities of a basic mini-computer at the 
present time, the extent to which the mini- 
computer could handle correction procedures 
is quite limited. 

Under the MARC input procedures in effect 
at the time of the mini-computer study, editing 
of the records was carried out manually. Be- 
cause editing is now being accomplished by 
means of the format recognition program, a 
reassessment of the mini-computer may be 
in order. Since the success of format recogni- 
tion depends on accurate typing:, greater flexi- 
bility in correcting simple typing errors before 
processing would promote greater accuracy in 
machine editing. 



Notes 

' RiXON Working Task Force. Convrrsion of Ht^fronpec- 
five Catalog Records to Machive-Readahle Farm: a 
St}f(hi of the Feasibility of a .National Bibliographic 
Service (Washington, Library of Congress, 19G9), p. 
52-55. 
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Chapter 8 



Microfilming Techni(|ues 



As part of the recon Pilot Project, micro- 
filming: techniques and their associated costs 
were investigated in cooperation with the staff 
of the Library of Congress Photoduplication 
Service. The possibility of obtaining cost esti- 
mates for commercial microfilming was con- 
sidered but was finally rejected on the grounds 
that spending staff time to explain the project 
to a contractor could not be justified when the 
Library had its own fully qualified photodupli- 
cation laboratory. 

The RECON feasibility report recommended 
that priority be given to the conversion of 
records for English-language monographs 
from 1960 to 1968, It was noted, however, that 
certain problems arise in connection with the 
use of the record set for any category of mate- 
rials, since this file is arranged by card series 
(year) and by sequential number within e?"h 
series. The file can be readily divided into one- 
year segments froni 1898 through 1968/ but 
the card numbering system does not lend itself 
easily to a division of the file by language or 
form of material. 

The RECON report recommended that the 
record set be divided into categories according 
to conversion priority, the cards filmed, and 
the file then reassembled, It was considered 
that selection of categories for conversion be- 
fore filming would be more efficient since fewer 
cards would h: 'e to be filmed. Further study 
during the pilot project indicated that the 
entire record set containing the category of 
material chosen for conversion should be mi- 
crofilmed and then culled for the titles to be 
converted. Although this method results in the 
filming of more cards, it presents the following 
advantages : 

1) The microfilm copy, containing the records 



for all languages and forms of material in the 
series chosen, can be used again for any other 
category of conversion. The need for return- 
ing to and disrupting the arrangement of the 
record set to select another category is thus 
eliminated. 

2) The records can be filmed as found, with a 
minimum of intervention by the operator. Se- 
lection of a particular language or form of 
material would require an individual with a 
knowledge of bibliographic data, and the film- 
ing would be slowed down by the selection 
process, 

3) The microfilm can be retained as a security 
copy of the record set, 

4) The figure for the number of cards printed 
in a given year is more accurate than the fig- 
ure for the number )f records representing a 
category of material for the same time period; 
hence, more reliable ' ost estimates could be 
established. 

Certain ground rules were established for 
tb^ actual filming process. The selected draw- 
ers of the record set would be ^'frozen'* for a 
day 0 ■ two before filming, i.e., cards known to 

out of the file would be refiled, and no cards 
Vould be removed from the file while filming 
was in process. The filming would take place 
during normal work hours. 

Once the decision v/as made to film first and 
select later, it was necessary to ascertain the 
volume of cards to be used as a basis for cost 
estimates. Since Photoduplication Service cost 
estimates are firm for only a one-year period 
because of the effects of increases in salaries, 
cost of materials, etc., and because there would 
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be paper handling problems in managing large 
quantities of worksheets for conversion, it ap- 
peared reasonable to assume microfilming 
rates in proportion to conversion rates rather 
than attempting to project cost estimates for 
filming the entire record set 

A volume of 100,000 cards for the year 1965 
wa chosen as a base figure for computation. 
It was estimated that one operator could film 
approximately 5,000 cards per day, and ap- 
proximately 20 working days would be 
required to film the collection of cards repre- 
senting one year of the record set. In preparing 
the cost estimates, it was assumed that quality 
control would be limited to inspection for tech- 
nical requirements only, with a spot check 
about every 300 images for camera operator 
errors. It would probably be less expensive to 
correct othei errors as they were discovered 
during the conversion process. It was antici- 
pated that these errors would not exceed one 
percent of the total number of exposures. 
There would be no inspection for bibliographic 
content nor would any attempt be made to 
guarantee file sequence, i.e., card numbers 
could be missing.'- 

Based on the method of filming before selec- 
tion and on the volume of cards cited above, 
cost estimates for the following alternative 
techniques were derived : 

1) Microfilming for direct-read optical char- 
acter reader specifications. 

2) Microfilming for reader/printer specifica- 
tions. 

8) Microfilming for reader specifications. 

4) Microfilming for Xerox Copyflo printout of 
LC printed cards overlaid on 8 x IOI/2 i^ich 
worksheet. 

The following definitions are given to help in 
understanding the techniques described: 

1) Planetary camera — a microfilm camera in 
which, during exposure, the film is held sta- 
tionary in a horizontal plane parallel to the 
item being copied. 

2) Rotary camera — a microfilm camera in 
which loose-sheet documents are transported 



on the surface of a rotary drum past a lens 
which records the document on a roll of film 
moving synchronously with the rotation drum 
at a speed equal to the reduction ratio. Al- 
though the unit cost per exposure is less for a 
rotary camera, the quality of the image may 
be inadequate for some purposes. 

3) Film — in all four technique^, the film UvSed 
is 16-mm negative microfilm. 

4) Reduction ratio — a numerical expression of 
the number of times a copy is i'maller in size, 
linearly, than the original from vhich it was 
made; expressed either in diameters (e.g., 5X, 
14.5X, 20X) or as a ratio (e.g., 1:5, 1:14.5, 
1:20). 

5) Image position — the orientation of images 
on a roll which can be controlled by turning 
either the document or the camera head and 
adjusting the reduction ratio accordingly. 
There are two basic positions : horizontal 
(lA), with the head of the image to the left of 
the frame, and vertical (IB), with the head 
of the image at the top of the frame. 

6) Feed — the method of transporting the doc- 
ument to be filmed to the camera head. 

7) Paper stock — the type of paper used in 
restoring images to eye-legible copy (hard 
copy). 

S'k Rate per exposure (microfilm) — unit price 
per image for microfilming. 

9) Rate per exposure (print) — unit price per 
image for restoring film to eye-legible copy 
(hard copy). 

Microfilming for OCR Specifications 

The study on input devices (Chapter 7) 
demonstrated that the present state-of-the-art 
is such that a direct-read OCR cannot be used 
to scan LC printed cards. The microfilming 
technique for the OCR is included in the present 
comparison on the chance that the capabilities 
of these devices may improve significantly in 
the future and to isolate problems that might 
arise in using the OCR for a large retrospective 
conversion project. It is assumed that proce- 
dures fv)r the use of the OCR would be as follows : 
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1 ) Microfilming of LC cards. 

2) Automatic reading of the fi'm and transfer 
of the data in digital form to a magnetic tape, 

3) Use of a format recognition technique to 
create a machine record in the marc format. 

4) Printing of a bibliographic rerovri on the 
computer printer for proofing. 

5) Selection of records for the category to be 
converted, 

6) Comparison of the computer-produced hard 
copy record with the main entry in the OflRcial 
Catalog and updating of the record where 
required, 

7) Proofing of the computer-produced record 
against a hard copy source document for for- 
mat recognition errors. 

8) Subsequent file maintenance procedures. 

It should be noted that point 7 above as- 
sumes a source document in hard copy form for 
human readability. Comparison of the compu- 
ter-produced proofsheet with the microfilm 
copy of the LC card by using a microfilm 
reader would place such a significant burden 
on the editor that it seemed unrealistic to 
consider this procedure. 

CompuScan specifications for a density 
range of 1.6 to 1.8 were used as the norm for 
the requirements of OCR devices. Serious prob- 
lems would arise in using the same film on 
Xerox Copyflo or even for contact printing to 
positive film because the density of 1.6 to 1.8 is 
not ideal for reproducing LC printed cards. The 
existence of heavily inked small characters and 
fine lines on the cards requires holding density 
to the 1.3 to 1.35 range to reproduce all text. 
Film suitable for OCR requirements would thus 
have little value for printout purposes, and a 
second filming would be necessary to provide 
hard copy. The cost estimate given below does 
not include the cost of this second filming. 

It would not be feasible to employ a rotary 
camera for production of film suitable for OCR 
devices since it would not be possible to ensure 
alignment of each image. In fact, there would 
be no guarantee that even a small portion of 
the images would be at a right angle to the 
edge of the film if a rotary camera were used. 
It thus appears that OCR requirements demand 



stop-motion photography and single-card expo- 
sure, using a planetary camera. 



C'amera Planetary 

Film 16 mm 

Reduction 20X 

Image position lA 

Feed Hand 
Rate per exposure for negative $.02 

Cost for 100,000 cards $2,000.00 



Microfilming for Reader/Printer Specifications 

This method assumes that the hard copy 
produced from ti^e microfilm would be in the 
form of a marc RECON input worksheet. Dur- 
ing the filming process, a form must be im- 
posed on the image in such a way that the 
resultant film copy has the 3 x 5 card posi- 
tioned to the I'ight of the worksheet. The use 
of a reader 'printer involves the following 
procedures : 

1) Microfilming of LC cards. 

2) Reading, via the reader, to select records for 
conversion; printing, via the printer, of selected 
records as hard copy source documents. 

3) Comparison of the hard copy source docu- 
ment with the main entry in the Oflicial Catalog 
and updating of the record as required. 

4) Keying of the record. 

5) Use of format recognition to create a ma- 
chine record in the MARC format. 

6) Printing of a bibliographic recoi d on the 
computer printer for proofing, 

7) Proofing of the computer-produced record 
against the hard copy source document for typ- 
ing or format recognition errors. 

8) Subsequent file maintenance procedures. 

A rotary type camera w^uld not be suitable 
for this technique since It does not provide 
the means for controlling tlie image position 
or for superimposing an input worksheet form 
on each image. The use of a stop-motion cam- 
era, with each image overlaid with an input 
worksheet form, senms appropriate. 

Camera Planetary 
Film 16 mm 
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Reduction 16X 

Image position IB 

Feed Hand 

Rate per exposure for negative $.0235 

Cost for 100,000 $2,350.00 

Microfilming for Reader Specifications 

This method does not provide hard copy, 
and its use would be unlikely because it makes 
keying and proofing extremely difficult. The 
following procedures would be required: 

1) Microfilming of the LC cards. 

2) Reading, via the reader, to select records for 
conversion ; keying of selected records directly 
from the screen cf the microfilm reader, 

3) Use of format I'ecognition to create a ma- 
chine record in the marc format. 

4) Printing of a bibliographic record on the 
computer printer for proofing. 

5) Comparison of the computer-produced rec- 
ord with the main entry in the Official Catalog 
and updating of the record \vhere required. 

6) Proofing of the computer-produced record 
against the Official Catalog record for typing or 
format recognition errors. 

7) Subsequent file maintenance procedures. 

This method requires keying both before 
and after catalog comparison. Additional key- 
ing may also be necessary because of changes 
made to the record in the Official Catalo;?. Be- 
cause there is no hard copy source document, 
the proofing must be done against either the 
Official Catalog record or the copy displayed 
on the microfilm reader. Since an input work- 
sheet is not produced in this method, a rotary 
camera may be used. A person reading the 
record on the microfilm reader would not be 
seriously hampered by uneven placement of 
the image on the screen. 



Camera 
Film 

Reduction 
Image position 
Feed 



Rotary 
16 mm 
20X 
lA 

Automatic 



Microfilming for Xerox Copyflo Printout 

This method of microfilming is employed for 
the sole purpose of providing hard copy source 
documents (in the form of the marg/recon 
input worksheets). The following procedures 
apply ; 

1) Microfilming of LC cards and production of 
worksheets. 

2) Selection of records for conversion from the 
worksheets. 

3) Comparison of the worksheet with the main 
entry in the Official Catalog and updating of the 
record as required. 

4) Keymg of the record from the worksheet. 

5) Use of format recognition to create a ma- 
chine record in the MARC format. 

6) Printing of a bibliographic record on the 
computer pijinter for proofing. 

7) Proofing of the computer-produced record 
against the hard copy source' document for typ- 
ing or format I'ecognition errors. 

8) Subsequent file maintenance procedures. 

The use of a rotary camera would not be prac- 
tical for the same reasons as discussed in con- 
nection with the microfilming for reader/ 
printer specifications. 



Camera 
Film 

Reduction 
Image positioi 
Feed 

Paper stock 
Paper size 



Planetary 
IB mm 
16X 
IB 

Hand 

20-lb sulfite 
8 X 1014 overall 



Rate per exposure for negative $.004 
Cost for 100,000 cards $400.00 



Rateperprint (SxlOVli), ?.07 

including microfilming 
Cost for 100,000 $7,000.00 

Investigation of a Technical Alternative 

A prototype mechanism developed by Devel- 
optron, Inc., was investigated and evaluated 
for purposes of retrospective conversion. The 
device consists of a scissor-type rig, approxi- 
mately 5 feet long and feet high, with a 
Leica 35-mm camera mounted at the top. The 
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apparatus is placed on a table top, and th^? 
operator sits at the scissor point with a tray of 
catalog cards aligned along the lower blade. 
The camera, which is mounted on the upper 
blade, is either lowered into the tray or posi- 
tioned immediately above the tray at 'he op- 
tion of the operator. Cards can be filmed in 
place or raised out o^ the tray for filming. 

The vendor suggested that a 24X Kodak 
RV2-Starflite camera head be substituted for 
the Leica since 16-mm unperforated film used 
with the RV2 would result in better resolution. 
Thirty-five mm perforated film at a 4X reduc- 
tion was used in the demonstration. 

Although the device appeared to function 
adequately, its use did not appear to offer any 
cost advantage over conventional microfilming. 
The savings in microfilmiiig and hard copy 
coats would be offset by the slowne.ss of the 
process and the fact that it would have to be 
repeated each time another category of rec- 
ords was selected from the same segment of 
the file. The cost of mounting the hard copies 
on worksheets would also have to be taken into 
account. 

Conclusion 

Despite the higher unit cost, it appears that 
the best alternative is to film all cards in 



given series against a worksheet form to pro- 
duce hard copy (Xerox Copyflo printout) and 
then to select the desired sub.set for conversion. 
Since the use of projected microfilm images as 
source documents is impractical, the reader- 
only option can be eliminated. The OCR method 
could not be employed unless a device were 
developed that could accurately scan LC 
printed cards, Use of the reader/printer 
method is a possibility, but the quality of hard 
copy print would be inferior to that obtainer 
in the Xerox Copyflo printout The cost of the 
reader /printer method, which does not include 
the cost of hard copy, varies with the device 
selected and could well approach that of the 
Xerox Copyflo rnethod. 



Notes 

' From 1969 until early 1972, cards in the record set 
\vei*e arranj^eii not by specific calendar year but rather 
by numbers in the 7 series, with the second digit beinj? 
a check digit. The year-series numbering was resumed 
in February 1972. 

- Gaps in the sequence of card numbers sometimes exist 
because certain numbers given to a publisher before 
publication ai not u^^ed ; however, a g*ap could exist in 
the file because a card was actually missing. 
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Appendix I 

MARC Decisions for Retrospective Cataloging 



After the analysis of 5,000 research titles 
was completed, problems concerning catalog- 
ing and MARC editing procedures were brought 
to the attention of appropriate personnel at the 
Library of Congress. Based on the discussions 
with these staff members, decisions were made 
to handle the problems as follows: 

LC Card Numbers 

1) A single dagger after a card number, e.g.. 
10-4173t, should be deleted. 

2) A second hyphen followed by a digit, e.g., 
1-6360-1 or 1-6360-1 Revised, should be deleted 
and input as l'6360//r38 (or whatever date 
appears with the revision symbol). 

3) When "Revised" (but no revision date) fol- 
lows the card number, use the date of catalog- 
ing found on the verso of the Official Catalog 
card. 

4) When an asterisk follows the number, e.g., 
8-30156*, delete the asterisk. 

5) LC card numbers such as F-3144 should 
have an 01 added after the "F,'' e.g., F01-3144, 
since 1901 was the year in which they were 
printed. 

6) For the present, card i.unnbers with a lead- 
ing digit greater than "7,'' t .g., 99-1974, can- 
not be input because a check digit error me?.- 
sage is generated. These records should be 
heH asidp until the programs have been modi- 
fied to accept these card numbers. 

Main Entry 

1) Single surnames without forenames will be 
transcribed with three spaces follov/ing the 



comma (A — 1 wpace). e.g., Dezauche, AAA 
When such names appear as added entries, 
they will be transcribed with three spaces 
rather than with a long dash, e.g.. Dezauche, 
would become Dezauche, AAA. 

Title Added Entry Indicator 

1) Before 1912, the printed cards contained 
no indication of whether the titles should be 
trace 1. Although titles have been traced after 
1912, these records do not have as many title 
tracings as they might under current practice. 
Input these records without adding any title 
added entry indicator. 

Title Statement 

1) Ellipses occurring at the beginning of the 
title should be removed unless they are printed 
as bold dots. Ellipses in the ''c'' subfield of the 
title should also be removed. All other ellipses 
will be included in the record. 

2) Line endings, used to distinguish two edi- 
tions of a rare book, are indicated as two verti 
cal lines. Replace with A/A. 

3) Input superscript or subscript alphabetic 
characters as regular lowercase alphabetic 
characters, except in formulas, e.g., 
A — B^"~^\ which is input as A — B 
* [superscript n] " 

4) An asterisk and a single dagger are used to 
indicate birth and death dates of a person, e.g., 
*Chiquinquira, 21 de mayo de 1857. Usiacuri, 
7 de febrero de 1923, or von Norbert Kliiken 
und Karl Hoffmannf. If there is a birth/death 
date phrase, delete it ^rom the title statement. 
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If there is only a single daprger following a per- 
sonal name in an author statement, delete the 
dagger. 

Imprint Statement 

1) When the reprint statement has the appear- 
ance of a double imprir.c, e.g., Bonnae, Apud 
Henry & Cohen, 1856; Frankturt/Main, 
Minerva, 1967, the actual reprint statemert 
should be separated from the imprint by a pe- 
riod instead of a semicolon. The reprint state- 
ment will not be considered part of the imprint 
field. 

2) In cases where place of publication has the 
appearance of a street address, e.g., 72- 
Souligne-sous-Ballon, I'auteur, 1968, '72'' is 
actually the zip code zone and Souligne-sous- 
Ballon is the name of the town. The two to- 
gether should be considered j the place of 
pi;blication. 

3) If neither a date uf publication nor the 
abbreviation [n.d.] is present, e.g., Paha, SNTL, 
tliis is an error. Refer this record to the 
cataloger. 

4) When an incomplete place name is given as 
the place of publication, e.g., Rio, Editora 
Simoes, 1956., this is an error. Refer such a 
record to the cataloger. (The example cited 
should read: Rio [de Janeiro].) 

5) When two places of publication are sepa- 
rated by **and," e.g., New York and London ; 
by ''und'' or "u.," e.g., Miinchen u. Hannover; 
or by a hyphen, e.g., Paris-Bruges or Milano- 
Roma-Napoli, delete the conjunction or hyphen 
and add a comma to separate the two place 
names. On German records, separation of two 
place names by a hyphen generally indicates 
that one place is located near a larger, better 
known place, e.g., Hamburg-Altona; such en- 
tries should be considered as a single place of 
publication. Occasionally, the hyphen is used 
on German records to indicate two places of 
publication. Such records should be given to a 
supervisor to check. On German records when 
two place names are separated by "bis" or 
"b.", e.g., Ratingen b. Diisseldorf, they consti- 
tute a single place of publication. 
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Collation Statement 

1) When the statement "Cover title"' is in- 
cluded in the collation, e.g.. Cover title, 36 p. 
21 cm., delete "Cover title" from the x:ollation 
and make it the first note. 

2) If a statement of illustration is given in the 
title paragraph but not in the collation, e.g., 
Hrs^^ von ISorbert Kluken. Mit 24 Ahbildun- 
gen und 13 l^abellen., leave the collation state- 
ment as it is 'and do not add illustration codes 
to the fixed field. 

3) When illustrative information is given as a 
general note, e.g., Maps on lining papers, do 
not add maps to the collation statement and 
do not add an illustration code to the fixed field. 

4) When a statement of reprint is included in 
the collation, e.g.. reprint: 2 v, in 1. 29 cm., 
delete "reprint" from the collation. 

Series Statement 

1) Many older records carry information in 
the series statements which is not used in the 
Rules for Descriptive Cataloging in the Li- 
brary of Congress or A'iiglo-American Cata- 
loging Rules, e.g.. Half-title: Library of philos- 
ophy. Ed. by J. H. Muirhead. Delete the words 
"Half-title:" or the editor phrase from the 
series statement. 

2) Some older records have what appears to be 
a "bound with" note transcribed in the series 
statement position and a series statement 
transcribed in a note position, e.g. : 

xxiv, 466 p. T7 cm. [With, as issued: Manetho^ the 
historian. Manetho. Cambridge, Mass., London, 1940] 

Greek and English on opposite pages. 

Half-title: The Loeb classical library . . . Manetho. 
Ptolemy, Tetrabiblos 

T^^g a field according to \'hat it is rather than 
where it is. 

General Notes 

1) Complex notes, e.g. : 

The following; information regarding dates of publi- 
cation of each volume is supplied by Dr. C. Wardell 
Stiles of the U.S. Department of Agriculture: 
V. 1. Jan. 1886-6 May 1886. 
V. 2. 13 May 1886-28 Oct. 1886. 



V. 3. 4 Nov. 188G-21 Apr. 1887. 
V. 4. 28 Apr. 1887-13 Oct. 1887. 
V. B. 20 Oct. 1887-B Apr, 1888. 
V. C. 11 Apr. 1888-27 Sept. 18H8. 

[etc.] 

Refer these records to a supervisor, who will 
in turn take ther.i to the principal cataloger for 
a decision. 

Subject Headings 

1) Subject headinp:s from other libraries (old 
cooperative copy) : With the exception of those 
records that contain th° legend "Shared Cata- 
lopfing with DNLM" or '^Shared Cataloging for 
DNAL." only LC suoject headings (including 
those for children's literature) will be used. 
Subject headings from other libraries should 
be deleted. 

a. Older records .sometimes contain subject 
entries that are composites of headings from 
the Library of Congress and other libraries. 
The other libraries' headings are in brackets. 
Djlete subject headings or parts of subject 
headings that are enclosed in brackets, e.g.: 

[1. Labor supply — Stat.— Russia] Delete the entire 
heading. 1. ^ruit[ — Hardiness] Delete only [ — Hardi- 
ness] 

This rule does not apply to LC children's head- 
ings or cards wi..i the legend: "Shared Cata- 
loging with DNLM'' or "Shared Cataloging for 

DNAL." 

b. Some retrospective records contain portions 
of subject headings enclosed in subscript pa- 
rentheses, e.g. ; 

1. Wapes — (Furniture workers, — United States. 

Delete only the subscript parentheses. Retain 
the uata within. 

c. Some retrospective subject headings con- 
tain both bracketed portions and portions en- 
closed within subscript parentheses, e.g.: 

1. Spraying and dustinpr residues .in agcriculturet 
[^Testing] 

Delete the subscript parentheses around "in 
agricultujv/' but retain the data; also delete 
[—Testing], e.g.: 

1. Spraying and dusting residues in agriculture. 

2) Personal names without dates used as sub- 
ject headings ; 

a. The ALA Cataloging Rides for Author and 
Title Entries contains a list of personal names 
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that may be used as subject headings without 
dates. Since this practice is no longer followed, 
the following nanes should have dates added 
when they are used as subjects. 

Ariosto, Lodovico, \414~l5Xi. 

Bach, Johann Sebastian, U)85-1750. 

Bacon, Francis, viscount St. Albans, 15()l-l()2^i, 

Balzac, Honore do, 1799-1860. 

Beethoven, Ludwig van, 1770-1827. 

Boccaccio, Giovanni, 1313-i:J75. 

Browning. Robert, 1812^1889. 

Biinyan, John, l()28-ir)88. 

Burns. Robert, 1769-179(). 

Byron, George Gordon Noel Byron, 6th baron, 

1788-1824. 
Carlyle, Thomas, 1795-1881. 
Cervantes Saavedra, Miguel de, 1547-l(ilG. 
Chaucer, Ge(iTi*»y, d. 1400. 

Colombo, Cr'jcoforo. [no dates on authority record] 
Corneillc, Pierre, U)0(;-ir»84. 
Cromwell, Oliver, 1699-1658. 
Dante Aliphieri, 1205-1321. 
Dickens, Charles, 1812-1870. 

Kliot, Georjjt^ pseud., i.e.. Marian Evans, after- 
wards Cross, 1819-1880. 
(loethe, Johann Wolfp:anK von, 1749-1832. 
Goldsmith, Oliver, 1728-1774. 
nawthnrne, Nathaniel, 1804-18G4. 
Heine, Heinrich, 1797-185G. 
lJufro, Victor Marie, comte, 1802-1885. 
Ibsen, Henrik, 1828-1906. 
Irving, Washington, 1783-1859. 
Lessing-, Gotthold Ephraim, 1729-1781. 
Lincoln, Abraham, Pres. U.S., 1809-1865. 
LonRfello^^, Henry Wadsworth, 1807-1882. 
Luther, Martin, 1483-1546. 

Mario Antoinette, consort of Louis XVI. King of 

France, 1755-1793. 
Milton, John, 1608-1674. 
Moliere, Jean Baptiste Poquelin, 1622-1673. 
Mozart, Johann Chrysostoni Wolfgang Amadeus, 

1756-1791. 

Napoleon I, Emperor of the French, 1769-1821. 

Petrarca, Francesco, 1304-1374. 

Pushkin, Aleksandr Sergeevich, 1799-1837. 

Racine, Jean BapMsle, 1639-1699. 

Rousseau, Jean Jacques, 1712-1778. 

Ruskin, John, 1819-1900. 

Schiller, Johann Christoph Friedrich von, 1759- 
1805. 

Scott, Sir Walter, bart., 1771-1832. 

Shakespeare, William, 1564-1616. 

Spenser, Edmund, 15527-1599. 

Tasso, Torquato, 1544-1595. 

Tennyson, Alfred Tennyson, baron, 1809-1892. 

Thackery, William Makepeace, 1811-1863. 

Tolstoi, Lev Nikolaevich, graf, 1828-1910. 

Voltaire, Fran(;ois Marie Arouet de, 1094-1778. 

Wagner, Richard, 1813-1883. 

Washington, George, Pres. U.S., 1732-1799. 



Title Added Entries 

1) On some retrospective records, titles have 
been inverted, e.g., I. Title: Retail Terms, A 
manual of. These will be tagged as titles traced 
diflferently (tag 740). Such a recorJ will not 
have \ title added entry generated from the 
title held. 
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Series Tracings 

1) Before 1947, series atatementa were not 
traced on the printed cards. The tracing (if 
one were present) was recorded on the main 
entry card in the Official Cataloj?. During cata- 
log conaparison, check the verso of the main 
entry Official Catalog card for a aeries tracing 
for ail records cataloged before 1947 and all 
recorda cataloged after 1947 that do not have 
an * after the card number. If the verso of 
the main entry card contains a series tracing, 
transcribe it on the input worksheet. 

2) Limited cataloging records do not contain 
aeries tracings. They can be identified by a 
double dagger after the card number, e.g., 
54-49564:|:. Leave these recorda as they are. 
The double dagger following the card number 
will be deleted and a substituted in its 
place. 

Full Name Notes (also Secular Name, 
Name Originally, etc.) 

1) Some older retrospective records have full 
name notes recorded on the right hand side of 
the card between the tracings and the card 
number, e.g. ; 

1. London—Description. I. Title. 
[Full name: William Richard Gladstone Kent] 

37-28551 

Delete these notes. 



2) Asterisks preceding added entries indicate 
that the personal name has been revised. If 
this name were used as a main entry, a name- 
originally note would be present. Delete aster- 
isks before personal name added entries. 



Copy Statement 

1) Copy statements without call numbers have 
been written or typed on some Oflicial Catalog 
main entry cards, e.g., 

Copy 2 

Copy 3 

Do not transcribe copy statements that are 
not printen on the printed card. 



Copyright Number 

1) Copyright numbers have been added to the 
printed cards at various times. They are re- 
corded in the lower left hand corner, below the 
Library of Congress legend, e.g.: 

Copyright A 29724 

Delete the cop> right number. 



Diacritics 

1) Old German uses a small e instead of an 
umlaut (*) over a, o, and u, e.g., pabstern, 
konigen, fursten. Convert the e*s to umlauts 
("). 
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Index 



Card selection: criteria for selection for conversion, 7; 
from card stoc't, 7-8; print index comparison, 7 

Catalof? comparison, 'certification code, 10; cost, 10; 
data elements affec^'ed, 10-11; justification, 5, 
10-11; machine-readable records (pre-MARC Dis- 
tribution Service), G; methods, 9-10; number of 
cham^es, 10-11; older and forei{fn-lanfruajfe rec- 
ords. 11 ; staff, 9 

Catalo|?ing and editing decisions: collation statement, 
45; copy statement, 47; copyright number, 47; 
diacritics, 47; ful' name notes, 47; general notes, 
45-46; imprint statement, 45; LC card numbers, 
44; main entry, 44; series statement, 45; series 
tracings, 47: subject headings, 46; title added 
entries, 4(^; title added entry indicator, 44; title 
statement, 44 

Cataloging rules and procedures: problems of changtjs 
for conversion, 8; problems with research titles, 25 

Cathode ray tube (CRT): character set and, 37; de- 
scription, 36-:)7; evaluation, 37 specifications, 37 

f neutralized conversion: current records, 1; retrospec- 
tive records, 1 

Character set: cathode ray tube, 37; Keymatic Data 
System, 29 

CompuScan Optical Character Reader, 35-36 
Content designators: assignment by format recogni- 
tion, 12 

Conversion strategy: RECON Pilot Project, 5; RECON 

study conclusions, 1-2, 5 
Cost per record; see Unit costb 
Council on Library Resources, Inc., 1-2 
CRT; see Cr.lhode ray tube 

Developtron, Inc. [prototype device for filming catalog 

cards], 42-43 
Luect-read OCR; see OCR, direct-read 
Dissly Systems; see Scan-Data 

Editing: foreign-language records, 25-27; format rec- 
ognition and, 12; retrospective vs. current records, 
8; staff, 6, training, 6 

Errors: cataloging/printing, 8; contractor 8; foreign- 
language editing test, 26; format recognition, 16, 
18-19 
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Farrington optical scanner; see OCR scanner 
Feasibility study; see RECON stMdy 
Fixed fields codes [difficulty of a^'sig vncr from printed 
cards], 8 

Fo 'eign-language records, 3, 24; editing, 25-27; errors, 
26; format recognition and. i9-20, 25; personnel 
requirements, 27; similarity to older English- 
language records, 25; source for research, 24-25; 
see also Research titles 

Format recognition, 2, 12; algorithms for, 13-14; cata- 
loging rules and, 25; core storage requirements, 
15-16; cost, 19; errors, IG, 18-19; ftasibilit> study, 
12; foreign-language records and, 19-20; Inter- 
national Standard Bibliographic Description and, 
19-20; partially edited records, 12; peripheral 
programs, 16; printed cards from, 19; processing 
time, 16; production. 16; program structure, 14- 
15; simulation, 14; specifications, 13; workflow, 16 

Format recognition typing, 18; contractor test, 18-19; 
cost of rjquired accuracy level, 19; specifications, 

Fundii ? for conversion: RECON Pilot Project, 1-2, 7; 
Rl CON stud/, 1 

IMLAC Corporation; see PDS-1 [CRT] 
Input by contractor, 8 ; quality controls, 8 ; record 
control, 8 

Input devices, 3, 28; see also Keyboard devices 
International Standard Bibliographic Description 

(ISBD), 19-2C 
Irascope, 37 

ISBD; see International Standard Bibliographic De- 
scription (ISBD) 

Key-to-cassette device, 28; see also Magnetic Tape 
SeUctri' Typewriter (MTST) ; Keymatic Daca 
System 

Key-to-computer-compatible-tape device, '28-29 

Key-..>disk system. 28-29 

Key-to-magnetic-tape system, 28-29 

Keyboard devices: categoiies, 28-29; requirements, 28 

Keymatic Data System: advantages, 29; character set, 

29; cost, 31-32; keyboard, 99; test, 29-32; typing 

problems, 29-3 T 
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LC Card Division card stock; see LC catalog records 
LC Card Division popular titles: overlap with Main 

Reading Room catalog, 24-25; source of research 

titles, 24 

LC Card Division record set: description, 5; micro- 
filming techniques and cost, 39-43; OCR and, 35- 
36; use in conversion, 5 

LC catalog records: assignment of fixed field codes 
■from, 8; cataloging/printing errors, 8; changes in, 
5, 10-11; comparison with Official Catalog, 5, 9- 
11; current records, 1; format recognition and, 2, 
12-20; legibility of source record, 8; microfilming, 
39-43; number, 5-6; OCR and, 35-36; reprinting, 
35 

LC Main Reading Room reference collection catalog: 

description, 24; selection of research titles from, 

24-25; source of research titles, 24 
LC Official Catalog: comparison of worksheets with, 

5, 9; conversion of, 5; description, 5 
LC printed cards; see LC catalog records 
Legibility of printed cards: for direct-read OCR, 35; 

for worksheets, 8 
Library of Congress: funding of conversion effort by, 2 

Machine-readable records (pre-MARC Distribution 
Service), 5-7; see also MARC I records; MARC 
II practice records 

Magnetic Tape Selectric Typewriter (MTST), 28; cost, 
22, 31-32 

MARC I records, 5-6 

MARC II practice records, 5-6 

MARC Distribution Service, 1 

MARC input programs, 11 

Microfilming, 3, 39; basis for estimating cost, 39-40; 

defiinitions, 40; for OCR, 40-41; for reader, 42; 

for leader/printer, 41-42; for Xerox Copyflo, 42- 

43; techniques, 40 
Mini-computer, 37-38 

MTST; see Magnetic Tape Selectric Typewriter 
(MTST) 

Multiple Use MARC System (MUMS), 20 

OCR, direct-read: evaluation, 35; format recognition 
and, 34-35; specifications, 35; technology, 35; 
types, 35 

OCR scanner, 34; cost, 34; use by contractor, 8 



Ohio College Library Center [use of Irascope], 37 
Older English-language titles; see Research titles 
On-line input with mini-computer, 37-38 

PDS-1 [CRT], 37 

Popular titles; see LC Card Division popular titles 
Print index: categories of machine-readable records, 

7; comparison of records selected with, 7 
Production, 2, 5-6 

RECON Advisory Committee, 2 

RECON feasibility report; see RECON study 

RECON Pilot Project: establishment, 2; funding, 1-2, 

7; objectives, 2-3 
RECON study, 1-2 

RECON Working Task Force: RECON Pilot Project, 

2; RECON study, 1 
Record set; see LC Card Division record set 
Resviarch titles, 2-3, 24; analysis of problems, 25; 
foreign-language editing test, 25-27; personnel 
requirements, 25 ; selection of records, 24-25 ; 
sources of records, 24-25; see' also Foreign- 
language records 

SBD; see International Standard Bibliographic De- 
scription (ISBD) 
Scan-Data, 36 

Shared cataloging; see Foreign-language records 
Spiras Model LTE; see Irascope 
Spiras Systems, Inc.; see Irascope 

Staffing: editors, 6; supervision, 7; typists, 7, verifiers, 
6 

Technical alternatives [RECON study], 21-22 
Training: editors, 6; typists, 7; verifiers, 6 
Two-up printing, 10 

Unit costs: catalog comparison, 10; differences from 
RECON study, 21-23; format recognition, 19, 22; 
Keymatic Data System, 31-32; microfilming, 41- 
42; MTST, 22, 31-32; OCR scanner, 34; simulated 
input costs, 21-22 

U.S. Office of Education, 2 

Xerox Copyflo, 42-43 
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